Force Fields for Protein Fold Recognition

               
 

Principal Investigator

Andrew Torda

Research School of Chemistry

Co-Investigators

Anthony Russell

Daniel Ayers

Thomas Huber

Research School of Chemistry

Projects

v04 - PC

       

 

Predicting the structure of a protein from its amino acid sequence is one of the undamental challenges of computational chemistry. Although there is a plethora of sequences available from gene sequencing, there are relatively fewer structures. The theoretical prediction of protein structures would therefore be a very useful tool in drug design and applications where structures from X-ray and NMR experiments may be unavailable or difficult to obtain.
Our current approach to protein structure prediction relies on building functions that take an amino acid sequence and calculate how well it would fit to every known structure in a database of structures. The matches are ranked using score functions which are mathematically similar to molecular mechanics force-fields, but have been constructed via numerical optimisation, without any explicit physical basis and are more closely related to methods from pattern matching. Despite the lack of a complete physical model, our methods are often successful in recognising correct protein structures and predicting the structure of unknown proteins.

 
           

 

 

What are the results to date and the future of the work?

One avenue of continual development is the set of force fields or score functions at the core of the calculations. These were originally built by optimising for some statistical force field properties . Most recently, we have tried to quantify some of the desirable, but elusive properties of a good force field. For example, how often does it find the correct protein in a sea of wrong proteins? This type of idea has been coerced into functions which can then be optimised, producing ever more intriguing force fields. The prediction program, "sausage" can swap between force fields at will and even during one calculation.

We have also been using ideas from bioinformatics to improve the quality of the alignments or positioning of amino acids onto each template structure. For any natural amino acid sequence, there are usually similar sequences also found in nature. These will possess the same or a very similar 3D structure. By adding this extra sequence information, there is an increased chance of predicting a correct structure. Using a related approach, sequences can be matched to multiple structures, further improving our protein structure prediction ability. Testing of the multiple sequence-structure methods is planned for 1999.


               
Appendix A -

               

       

Whether we get the right or wrong answer, we can now get it much faster than a year ago. Our sequence to structure alignment calculations have used a reliable O(N3 ) algorithm, but we are currently testing an efficient O(N2) algorithm and measuring whether the speed comes at any cost of correctness. We also have a very portable, parallel version of the code which runs at breakneck speed on the cluster of linux machines at ANUSF.

What computational techniques are used?

We use home-built codes for the numerical optimisation which produces the score functions. Our main application code is another home-built piece of more than 15000 lines of C, linked to Tcl so as to provide an interpreter (ftp://ftp.rsc.anu.edu.au/pub/torda/sausage/README). Documentation is at http://www.rsc.anu.edu.au/~torda/sausage.html . In the heart of this package lie the functions for scoring protein sequences with candidate structures and methods such as a dynamic programming algorithm and Monte Carlo/simulated annealing for tackling the combinatorial problems. For parallel calculations on the ANUSF linux cluster, sausage has been taught to communicate over sockets using SSTP (simple sausage transfer protocol).

Publications

T. Huber and A.E. Torda, Protein Sequence Threading, The Alignment Problem And A Two Step Strategy, The Journal of Computational Chemistry, submitted, 1999.

D.J. Ayers, T. Huber and A.E. Torda, Protein Fold Recognition Force Fields: Unusual Construction Strategies, Proteins, submitted 1999.

T. Huber, D. Ayers, A.E. Torda, A.J. Russell, Sausage: Sequence-structure Alignment Using a Statistical Approach Guided by Experiment, http://www.rsc.anu.edu.au/~torda/sausage.htmland ftp://ftp.rsc.anu.edu.au/pub/torda/sausage/README

       

 

 

 

 

 

 

 

 

- Appendix A