Comparative Evolutionary Analysis of Mitochondrial Genomes Using Maximum Likelihood

             

Principal Investigator

Lars Jermiin

John Curtin School of Medical Research

Recent advances in DNA sequencing techniques have provided a wealth of data from genome projects, and it has become apparent that current analytical tools are inadequate. The aim of this research project is to develop and use computer programs for the inference of phylogenies by maximum likelihood (ML) using alignments of nucleotide or amino acid sequences obtained from completely-sequenced mitochondrial genomes of 50-80 metazoan species. The aims are to provide a comprehensive understanding of the mitochondrial evolution from yeast to man and to allow for statistically sound comparisons of genomic and morphological traits as well as for inference of the temporal changes in ancestral traits.  

Co-Investigators

     

Margaret Kahn

ANUSF

Simon Easteal

John Curtin School of Medical Research

     
         

Projects

w02 - VPP, PC

What are the results to date and the future of this work?

We have developed a new computer program, TrExML (Tree-Space Exploration by ML), that allows for exhaustive and subexhaustive searches of tree-space. The program differs from other available ML programs in allowing the searches to be conducted through a much wider search window. Depending on the hardware, the program is guaranteed to find the ML tree for samples of up to 11 sequences. For larger data sets, the program is very likely to find the ML tree and if requested, the program will also recover a large proportion of those trees that are in the vicinity of the ML tree. This has the advantage that a consensus can be generated from those trees using another computer program TreeCons, that has been developed by some of us. TrExML contains a number of improved features that makes it as fast as or faster than fastDNAml, which will only search as very small part of tree-space. Three studies have so far made use of TrExML on the PC. A large preliminary analysis of 40 completely-sequenced mitochondrial genomes was done at an early stage in the project. This analysis will be repeated, however, because more taxa have been included in the alignment and a better search strategy is now in place. A data set comprising 60 mitochondrial D-loop sequences from indigenous Australians and closely-related primates has also been analysed, and finally a data set comprising archaebacterial nuclear 16S rRNA sequences has received attention. Recently, an analysis of ~45 million trees was done easily using TrExML on the PC and a subsequent analysis of ~364 million trees was almost completed when the calculations were stopped due to a planned maintenance break; to our knowledge, such large analyses has never been accomplished previously. TrExML and TreeCons are written in C and the former has been tested extensively on the PC.

             
Appendix A -

             
     

We are currently optimising the code with the aim to take advantage of parallel processing and to reduce memory requirements. As for the VPP version we still need to determine if it is worth changing the code to take advantage of the Fortran compiler.

What computational techniques are used?

The search procedure that is used in TrExML is based on the following principal. In order to find the ML tree, and as many other trees in the vicinity of the ML tree as possible, we first generate and compare all binary trees on X sequences. From these trees, we retain the Y most likely trees, to which we add the X+1th sequence to each branch in each of the Y trees. The resulting trees are compared and the Y most likely trees are saved. This procedure continues until all sequences have been included in the phylogeny. Using this new heuristic approach, whereby additional sequences are added to all edges in a large number of trees, analyses of very large data set can now be conducted easily and relatively efficiently on the PC.

Publications

L. S Jermiin, G. J. Olsen, K. L. Mengersen, S. Easteal, Majority-rule consensus of phylogenetic trees obtained by maximum-likelihood analysis, Molecular Biology and Evolution, 14, 1296-1302 (1997).

M. Wolf, L. S. Jermiin, S.Easteal, M. Kahn, B. McKay, TrExML - a maximum likelihood program for exhaustive tree-space exploration, Bioinformatics, in press (pending final acceptance)

     
- Appendix A