[Graph logo] Algorithms for the reconstruction of phenograms and phylograms

Phylogenetic
Methods

Phylogenetics
Main Page

Input-output
Data

Data Conversion

Algorithms

Software Use

An Alternative
Approach

Bibliography

 



This page contains materials in relation to the project on phylogenetic methods for applied evolutionary economics.
 
Given appropriate industrial characteristics data and their conversion to an appropriate format, MEGA2 can be used for tree reconstruction. The algorithms of relevance for the reconstruction of bifurcating industrial trees are the cluster analysis method, the minimum evolution method, the neighbour-joining method, and the maximum parsimony method.

  • The cluster analysis method (UPGMA) uses a distance matrix, but it may also start by constructing the distance matrix from a characteristics matrix. Instead of exploring all possible bifurcating trees it very quickly makes a bottom-up construction of a tree that will often be relatively close to the tree with a minimal number of mutations. The algorithm is designed so that it returns a rooted tree. Given that the distance matrix has already been calculated, this method uses the following algorithm:
    • Find the two nodes with the smallest distance and connect them with an internal node. The distance from each of the old nodes to the new node is defined as half of the total distance between the old nodes.
    • Calculate the new distance matrix that includes the new node k but excludes the two nodes (i and j) that it joins. In this new matrix rows i and j are removed. Instead a new row k is inserted, where each entry is the mean of the same entries for i and j in the original matrix.
    • If the distance matrix consists of a single element, then stop. Otherwise, go to 1.
    This algorithm explains the acronym UPGMA: the Unweighted Pair-Group Method using Arithmetic averages. After the algorithm has finished, the program continues to reconstruct the bifurcating tree by means of saved results from the individual steps. First it draws the root node: this is simply the node defined by the singleton distance matrix. Then it draws the two descendants of the root, which were found in the second last round. It proceeds to draw internal nodes and leaf nodes until it finally draws the two leaf nodes that were joined in the very first round.
  • The closely connected minimum evolution (ME) and neighbour-joining (NJ) methods are built to remedy some weaknesses of the primitive cluster analysis (UPGMA) method. In the minimum evolution criterion is the sum of all branch length estimates. This measure is computed for all plausible topologies, and the topology that has the smallest value is chosen as the best tree. This criterion does not require the rate of evolutionary change is the same in all branches of the tree as (assumed by the UPGMA method). Therefore the result is an unrooted tree. The restriction to plausible trees is made to avoid computational explosion. Thus the algorithm begins with a given initial bifurcating tree that is e.g. provided by the neighbour-joining (NJ) method. This method starts from (or constructs) a distance matrix and make a stepwise approach to find the tree that is designed to give better picture of the relative lengths of the individual branches of the tree than the UPGMA method. The main weaknesses of the NJ method vis-a-vis the UPGMA method are that it is more complicated and that it returns an unrooted tree rather than a rooted tree. However, the placing of the root by means of the UPGMA method is not very reliable, so this is a minor weakness. Furthermore, there are other methods for determining the location of the root.
  • The maximum parsimony method (MP) uses the characteristics matrix to perform an, in principle, full analysis of all possible bifurcating trees. The MP method returns the unrooted tree(s) that imply(s) the smallest number of mutations. When calculating the length of a tree, the MP algorithm starts by assigning optimal characteristics to all internal nodes. Then the length is simply the sum of the distances between all the nodes of the tree. If more than one MP tree is found, then it is possible to give weights to the individual branching points of one of the trees according to the fraction of the MP trees in which they occur. The algorithm may be constructed so that it examines the most likely solutions first, and it may be forced to stop after a certain number of trees have been examined (say, 10,000,000).
The application of several methods of tree reconstruction serves as a means to explore the degree to which different trees reflect the underlying data. It is, for instance, important to know whether a tree found by cluster analysis (UMGMA) og minimum evolution (ME or NJ) have many competitors that just as well fits the data. This question can to some extent be checked by the maximum parsimony (MP).

Distances. Another task is to obtain a realistic understanding of the distances between the different parts of the bifurcating tree. The ME/NJ and MP distances tend to be close to each other, while the UPGMA tree often is quite different. As already mentioned the latter distances are not as good as the former ones. However, if we make the (unrealistic) assumption that the process of change started from the root and had exactly the same speed along all branches, then the UPGMA tree would show the correct distances. Thus we in this case has a clear-cut (although nearly always incorrect) theory. The same is not the case for the other methods.

Bootstrapping. There are also statistical methods for analysing the reliability of the obtained tree. The analyses of the present paper are performed by means of the statistical bootstrapping method. Bootstrapping is made to test the sensitivity of the tree to minor changes in the underlying data set (presuming that it does not exactly represent the true characteristics of the industries). Thus a bootstrap matrix is e.g. created a 1000 times, each time by randomly selecting the required number of columns from the given characteristics matrix. For each bootstrap matrix a new tree is created. All this information is used to indicate in the branching points of the basic tree the fraction of the bootstrapped trees that have the same branching point.

Some further information on the use of MEGA2 is found together with the description of data conversion.


Maintained by Esben Sloth Andersen, email: esa@business.aau.dk.
Revision: 09 August 2004, 13:36.