Systematics and Evolutionary Biology

Programs.pl

These are programs that, with a few exceptions, Olaf Bininda-Emonds has written to help in his research. Many of these programs are designed to provide input for PAUP*. All were written in Perl (using either BBEdit or TextWrangler, both from Bare Bones Software) and are available for download as zipped files.

Some quick notes about the programs:

  • All programs by Olaf Bininda-Emonds are open-source and, although not stated formally in most programs, are distributed according to the GNU General Public License; the same cannot be guaranteed for those written by others that are on this page, although all are freely downloadable.
  • All require a Perl interpreter (which is freely available for Windows, Linux, and pre-OS X Mac platforms; it's built into Mac OS X).
  • All are command-line driven (sorry ...). transAlign and seqConverter, however, do have primitive user-interactive interfaces so as to avoid those nasty switches.
  • Basic help can be found in most cases by typing either "perl ‹programName› -h" or "./‹programName› -h".
  • Only the most recent version of each program is present for download; older versions can be had on request.
  • Except for some of the EvoDevo scripts, they all function properly (including the bugs) in the Unix environment implemented in Mac OS X. Your mileage might vary, especially on Windows systems, which use different system calls.
  • If you want to cite a program that doesn't have an explicit citation, please consider the following generic format:
    • Bininda-Emonds, O.R.P. ‹year›. ‹name and version of program›. Program distributed by the author. AG Systematik und Evolutionsbiologie, IBU - Fakultät V, Carl von Ossietzky Universität Oldenburg.
  • Naturally, the usual caveats apply: these programs might (= undoubtedly) contain bugs and you agree to use them at your own "risk" (or that of your results, really). If you think that you've found a bug, please let me know!

    Note: except for the EvoDevo scripts, all programs have been updated to automatically detect what flavour of linebreaks (Unix, Mac, or DOS) are being used in the input files. So, there is no need to run this program anymore beforehand (although it is still available below).

Supertrees (and trees in general)

BootStrip.pl

v.1.2.2

Determines the bootstrap frequencies for a given phylogenetic tree based on the results of a bootstrap analysis.

chronoGrapher.pl

v.1.3.3

Adds branch length information to a NEXUS-formatted tree description corresponding to divergence times estimates of the nodes to create an ultrametric "chronograph". Will interpolate any missing dates using the log-log formula of Purvis (1995; see Bininda-Emonds et al., 1999 for the reference) and can correct for negative branch lengths.

higherLabels.pl

v.1.1

Labels internal nodes of a NEXUS-formatted tree description with names of higher-level taxa according to a user-input taxonomy. These labels can be viewed in programs such as TreeView.

nodeNumberer.pl

v.1.0

Adds node numbers to a NEXUS-formatted tree description for presentation purposes. These labels can be viewed in programs such as TreeView.

partitionMetric.pl

v.1.2.1

Calculates the number of unique clades / partitions for pairs of trees pruned to their common taxon set. Will optionally calculate a value weighted according to a measure of nodal support (given as a branch length), and can use this to calculate if two topologically identical trees have statistically different support values.

QualiTree.pl

v.1.2.1

Calculates the qualitative support for the clades present in a supertree relative to those in the source trees contributing to it. Described in Bininda-Emonds (2003) with a modified version (rQS) described in Price et al. (2005).

  • Prachi Shah and Davide Pisani of Penn State University have kindly ported (an older version of) this program as a DOS executable file. It might only operate using the default parameters, however.

 

relDate.pl

v.2.2

Derives relative branch length formulas for dating a supertree from one or more gene trees according to the "local molecular clock" procedure in Purvis (1995; see Bininda-Emonds et al., 1999 for the reference). Dates can be relative to either ancestral or daughter nodes.

ReverseMRP.pl

v.1.0a

Uses PAUP* to reverse engineer the source trees from a NEXUS-formatted MRP data matrix and store in a single tree file. Requires, naturally, that the character boundaries of the source trees are specified (as nexus-format CHARSETs).

SuperMRP.pl

v.1.2.1

Converts a NEXUS-formatted treefile into a NEXUS-formatted data file ready for analysis. Incorporates both standard and Purvis MRP coding, and allows source trees to be coded as either rooted or unrooted (the latter as described in Bininda-Emonds et al., 2005).

synonoTree.pl

v.2.1

Standardizes the taxon names in a set of source trees according to a user-input reference taxonomy and synonomy list. Mismatches are flagged for the user to correct. Note that it cannot account for branch-length or support information in the trees. Described in Bininda-Emonds et al. (2004).

  • (Note: versions 1.0.x had serious bugs and should not be used!)

 
taxonoTree.pl v.1.0 Constructs the tree associated with a hierarchical taxonomy presented in a tab-delimited text file.

treeConverter.pl

v.1.0

Converts a NEXUS-formatted treefile into a PHYLIP-formatted treefile or standalone NEXUS-formatted data file.

treePruner.pl v.1.0 Prunes the trees in a NEXUS-formatted treefile to their common taxon set, of specific user-input taxa, or both. Can optionally retain support values in the trees.

Sequences (and data mining)

autoMT.pl v1.0 Allows for batch testing of the optimal model of evolution for a series of sequence files. Model testing can be performed using either ModelTEST with PAUP* or MrAIC.pl with PHYML. The applicability of the molecular clock can also be tested using the ModelTEST / PAUP* combination.
batchPHYML.pl v1.0 Provides a wrapper around PHYML to easily perform sequential analyzes on a set of data matrices specfied by the user in a tab-delimited text file.
batchRAXML.pl v1.1.1 Provides a wrapper around RAxML to easily analyze a set of data files according to a common set of the search criteria. Also organizes the RAxML output into a set of subdirectories. Compatible with RAxML-VI-HPC v2.2.3.
fourX.pl v1.0 Implements the four-times rule of Birky et al. (20052010) (also known as the K/θ rule) to identify species via molecular data based on population genetics theory. Requires autoMT.pl (see above), PAUP*, PhyML, and RAxML. FigTree is also requires to view one of the output trees.

GenBankStrip.pl

v2.0

Mines all gene sequences from a GenBank output file according to annotations provided in each accession. As such, it is limited by the accuracy of the information given in the accession and uses a restricted library of gene synonyms. However, it can often mine more evolutionarily divergent sequences and better account for paralogs than can a BLAST-based search.

GeneCount.pl

v1.0a

Crude program to count number of sequences for a given gene in a GenBank download. Does not correct for differences in spelling, etc.

moleRat.pl v1.0 Calculates rates of evolution along the branches of one or more (gene) trees with respect to a dated reference tree, both for each tree individually and across the set of trees as a whole. Also identifies branches and clades that are evolving significantly differently from the overall average or that have changed their rate significantly with respect to an ancestral reference point (as determined using a paired Student's t-test and a paired Fisher's sign test). Described in Bininda-Emonds (2007).
popDiversity.pl v1.0 Simple program to quantify any possible bar-coding gap between two species. Requires PAUP*.
seqCat.pl v1.0 Creates an interleaved nexus-formatted supermatrix of individual data matrices (in any of fasta, NEXUS, PHYLIP, or Se-Al formats).

seqCleaner.pl

v1.0.2

Processes an aligned DNA sequence data set to retain only 1) those sequences with a minimum level of pairwise overlap and 2) the five most diverse (and longest) sequences for taxa with greater than five sequences. Data can be input in any of fasta, NEXUS, PHYLIP, or Se-Al formats.

seqConverter.pl v1.2 Convert between some commonly used file formats (fasta, NEXUS, PHYLIP, or Se-Al) as well as performs simple data transformations (e.g., modify gaps, translate to amino acids, convert to haplotype data). Can also batch convert all programs of a specified file type in the working directory. A program that recognizes more file formats is sreformat, part of the HMMER package.

transAlign.pl

Benchmark data sets

v1.2

Facilitates the multiple alignment of protein-coding DNA sequences by aligning the amino acids sequences they specify. Data can be input in and output to any of fasta, NEXUS, PHYLIP, or Se-Al formats. Requires a local copy of ClustalW. Described in Bininda-Emonds (2005).

EvoDevo

The Parsimov package, as described in Jeffery et al. (2005). All programs and example files were written by Jonathan Jeffery.

Parsimv7g.pl

v.1.0.7g

Implements event-pair "parsimony cracking" as described in Jeffery et al. (2005).

ReplacerParsimv.pl

v.1.0.3b

Takes a Parsimv7g.pl output file and replaces the PAUP* character numbers with more readable character names according to a user-specified text list (e.g., CellType.txt, based of cell-line characters in spiralians).  For the latter, each line contains the character number (in ascending order) and character name, separated by a tab.

A lot of replacing can be automated using a batch file (e.g., ReplaceBat.txt).  The batch file contains the name of the output file to have its character numbers replaced and the name of the text-list to use, separated by a tab.

Describe.pl

n/a

Creates a PAUP* command file to describe each tree in memory under ACCTRAN and DELTRAN optimizations (saving each as separate log files) plus a Parsimv7g.pl batch file (e.g., ParsBatch.txt) to crack each of the PAUP* log files produced.  Handy for big jobs.

ParsBatch.txt

n/a

An example of a batch file -- if you have several log files to work through (e.g., ACCTRAN or DELTRAN optimizations, different topologies, etc), you can use this to get Parsimv7g.pl to run through each in turn.  The contents are, for each log-file you want to analyze: the name of a log file, the path where you want its output files written, whether to use all [a] or unambiguous [u] changes, whether to use a thorough search if feasible [y/n] and whether to clean-up the "working" files as it goes along [y/n] (useful for big data sets where temp files can reach 100s of MB).  This data must be tab-separated on a new line for each log file.

Other EvoDevo programs (written by Olaf Bininda-Emonds). Note: except for GapSim.pl, these programs are “somewhat” dated and run in MacPerl only.

EPcracking.pl

v.1.0.2

Implements event-pair cracking as described in Jeffery et al. (2002).

BreakPoint.pl
EventPair.pl
JuncCode.pl

v.1.0

These three programs will encode developmental timing data according to three different coding schemes: breakpoint distances, event-pairing, and junction coding. Each program will output a NEXUS-formatted data file ready for analysis.

AncSeq.pl

v.1.0

Infers the developmental sequence of a hypothetical ancestor on a cladogram, following the procedure described in Jeffery et al. (2002).

GapSim.pl

v.1.0a

Calculates whether the events in a given developmental sequence are evenly distributed or not. Described in Bininda-Emonds et al. (2003).

Miscellaneous

PerlEQ.pl v.1.0b12 A program, written by Jonathan Jeffery, that performs Safe Taxonomic Reduction (developed by Mark Wilkinson) to identify taxa that can be safely removed from a phylogenetic analysis because they are essentially "redundant" with other taxa. I have produced a modified version that allows all input options to be specified from the command line, including a new option to suppress output of the data matrix and character diagnostics to the html file (to keep the size of this file down somewhat).
  • Note: the program is no longer being actively maintained.

 

perlRat.pl

v.1.0.9a

Creates a PAUP* batch file with the necessary instructions to perform a parsimony ratchet analysis.

reverseSTR.pl v.1.0 Re-includes taxa, where this is unequivocally possible, to a NEXUS-formatted tree description derived from a analysis using Safe Taxonomic Reduction. Requires the output of STRindexer.pl.
STRindexer.pl v.1.1 Parses the html output file of PerlEQ to identify taxa that can be safely removed from an analysis and then potentially unequivocally re-included (using reverseSTR.pl). Essentially, these are taxa conforming to the PerlEQ category C*.
STRedundancy.pl v.1.0 Identifies characters in a data matrix that will become redundant (i.e., duplicate others or become uninformative) upon deletion of a specified set of taxa. It is geared largely towards STR analyses of MRP matrices.

lineBreak.pl

v.1.0

Converts Mac and DOS-style line breaks to Unix-style ones. This is a holdover from when my scripts required Unix-style line breaks. Here is an even better program (for Mac OS X) with drag-n-drop and the works.

An extensive list of phylogenetic programs is maintained by Joe Felsenstein. Many of the these programs, and ones that we use frequently, were written by members of the Evolutionary Biology Group at the University of Oxford.

 
Mac OS X logo
 
 
TextWrangler logo
 
 
Perl logo
 

­