Blending Protein Secondary Structure Information and Knowledge-Based Force Fields

               

Principal Investigator

Andrew Torda

Research School of Chemistry

There are many groups in the world trying to predict a protein's structure from its amino acid sequence
alone. At the same time, there are many experimental groups determining protein structures experimentally. In between, there has been a flood of data (usually from nuclear magnetic resonance) which provides incomplete and inadequate data for structure determination.
Our aim is to take our often unreliable structure prediction methods and add the incomplete data generated during structure determination by nuclear magnetic resonance. This could be seen as improving structure predictions by using experimental data or perhaps as a means of doing something useful with experimental data which alone would not be of great use.

The methodology should be of use to experimental groups working on proteins which are too big for complete structure determination by NMR, but which do yield some information on their secondary structure.

   

Co-Investigators

     

Dan Ayers

Research School of Chemistry

     

Projects

w51 - PC

     
         
               

     
               

What are the results to date and the future of the work?

Work has been entirely devoted to algorithm development and implementation. We already have a program which takes the sequence of a protein of interest, searches through a library of known structures and uses special score functions to find the most likely match for the sequence. This limited form of structure prediction is probably useful just under half the time.

The work has meant taking protein secondary structure information (as might come from nuclear magnetic resonance measurements) and creating an additional scoring function. This measures how well an atomic model based on the candidate structure would agree with the experiment. In practice, the problem is complicated by the fact that our prediction code attempts to wander the combinatorial wasteland that one finds when an amino acid sequence is allowed to align to candidate structures with gaps and insertions.

Normally, one talks of protein secondary structure in terms of a few commonly observed types. Our score functions based on this discrete view have been taken to the retirement home of unloved code. A recent arrival from the maternity ward of penalty functions is a more continuous function which appears much more resistant to the inaccuracies of real experimental data. Early testing suggests that the latest ideas move the method from an academic curiosity to a

               
- Appendix A

 
               
       

useful tool for experimental laboratories. In more concrete terms, we expect that if a protein structure has been published which is similar to the correct answer for a protein sequence, there is a 60-80% chance that our method will find it given realistic data.

The next step is more rigorous and large scale testing with synthetic data and applications with real experimental data.

What computational techniques are used?

We use home-built code which is always available (ftp://ftp.rsc.anu.edu.au/pub/torda/align/README) and documented (http://www.rsc.anu.edu.au/~torda/align.html). The package is written in C, but driven via a Tcl interpreter providing a non-hostile (if not user-friendly) interface. It includes the non-physical score functions used for general structure prediction, as well as the ability to read experimental data in some formats commonly used by experimental spectroscopists.

       
Appendix A -