[Graph logo] Preparing the data and using the software

Phylogenetic
Methods

Phylogenetics
Main Page

Input-output
Data

Data Conversion

Algorithms

Software Use

An Alternative
Approach

Bibliography

 



This page contains materials in relation to the project on phylogenetic methods for applied evolutionary economics.
 
It is not enough to understand the principles of the algorithms for tree reconstruction and to have appropriate industrial characteristics data. The use of e.g. MEGA2 for tree reconstruction requires that data are available in correctly coded form. The preparation of the data is, in principle, simple. But it presupposes quite a number of steps, which require additional software. Let us therefore briefly consider an example of the full sequence of steps of data manipulation. We shall also give some hints on the use of MEGA2.
  • Open an OECD input-output table in SAS or Microsoft Excel.
  • Make a new worksheet, where the first column gives the OECD names of the industries in modified form. Especially, the original names must be changed so that no spaces are included.
  • Calculate the input characteristics and place them in columns 2, ..., n+1. Remember that the input characteristics matrix is transposed compared to the original OECD table.
  • Calculate the output characteristics and place them directly in columns n+2, ..., 2n+2.
  • Save the worksheet as a simple tab-delimited file.
  • Convert the tab-delimited file into MEGA format, like this example. This can be done in many ways. One convenient possibility is:
    • Open the text file in MacClade - a phylogenetics software package oriented toward data manipulation. During the opening indicate that the file is formatted as a simple table.
    • In MacClade change the coding system to DNA. This transforms 0 to A and 1 to C. You may also explore tree construction by means of the maximum parsimony method (in Tree View), but this is not a necessity.
    • MacClade may also be used to create files with multiple copies of the input characteristics. To create the CIIO matrix, add n columns in the MacClade Data View, copy the input characteristics and paste a copy into the new block of characteristics.
    • Export the data from MacClade to a file in MEGA1 format.
  • Open the converted file in MEGA2 as the data type 'Nucleotide Sequences', and answer 'No' to the question about protein coding.
    • You may now use MEGA2 for calculating distance matrices and/or for directly applying the available algorithms of tree reconstruction. Before doing so the distance model option has to be set as `Nucleotide: Number of Differences'.
    • By default MEGA2 uses all available industries for tree reconstruction. However, a subset of industries may be chosen. This requires that you open the Data Sequence Explorer and deselect the industries that should temporarily be excluded. When an algorithm is invoked, it only uses the selected industries.
    • By default MEGA2 uses all available characteristics for tree reconstruction, but a subset of characteristics may be chosen. To do so, open the Data Sequence Explorer, select 'Setup/Select Genes & Domains', remove the group Data, add 'genes' e.g. called Input and Output, add a domain to each of these 'genes' (e.g. 1-35 and 36-70). Now you may include or exclude input characteristics or output characteristics.
    • Trees are exported from MEGA2 as Enhanced Windows Metafiles (emf files), which have to be translated into Encapsulated postScript files (eps files) by means of e.g. CorelDraw. Then they can be inserted into ordinary documents.


Maintained by Esben Sloth Andersen, email: esa@business.aau.dk.
Revision: 09 August 2004, 13:36.