Blending Protein Secondary Structure Information and Knowledge-Based Force Fields


Principal Investigator

Andrew Torda

Research School of Chemistry


Dan Ayers

Anthony Russell

Research School of Chemistry


w51 - PC


Much of our group's work has been geared toward protein fold recognition, a primitive form of structure prediction. This means that if one is given a protein's amino acid sequence, we would not try to predict its structure from first principles. Instead, one could scan through a library of known protein folds searching for the one most likely to be closest to the unknown structure. The aim is to find the overall shape of a protein (fold), rather than detailed accurate coordinates. The obvious limitation is that one can only be correct when the unknown structure is similar to something previously seen. This is not a severe problem since totally new structures are a relatively infrequent discovery. More fundamentally, we have the problem that the methods are not yet reliable enough to give to a chemist as a predictive tool. This is substantially due to weaknesses in the score functions (force fields) used to calculate the fitness of a structure for a sequence.

Sometimes, there will be experimental information which appears complementary. It is not sufficient to determine a protein's structure, but it does contain useful information. The aim of this work has been to combine the speculative with the experimental and use secondary structure information from nuclear magnetic resonance (NMR) measurements to guide the calculations. The work is particularly interesting for the class of proteins which are too large for a complete structure determination by NMR, but do have spectra which can be assigned.



What are the results to date and the future of the work?

We now have a working implementation of these ideas in our program known as "sausage". Given our target audience (protein spectroscopists) it reads data in formats commonly used in experimental labs. Within the code, this controls a "cost function" which is added to the rest of the knowledge-based force field. For fun, the code has also been persuaded to read protein secondary structure speculation as produced fromother workers' programs.

Appendix A -



The use of even sparse experimental data is an outstanding success. The bare force fields are useful in a predictive sense about one third of the time. This is computationally impressive, but practically disappointing. With realistic (poor) experimental data, this figure rises to success about 3/4 of the time.

The use of secondary structure information from other sources is also helpful. This is a surprise since it shows that some terms in our current force fields are not as successful as believed. If they were perfect, there would be no benefit in including results from other programs.

As well as increasing the ability to recognise protein folds, there is an added benefit in the accuracy of the structures produced. The code (sausage) produces structural models based on alignments to known structures. These are improved as well as the statistical success with fold recognition.

This work has served to highlight deficiencies in our current force fields. We are now using the analysis tools developed to identify where most computational effort should be spent.

What computational techniques are used?

We use home-built code which is always available ( and documented ( The package is written in C, but driven via a Tcl interpreter providing a very flexible interface.

Algorithmically, it uses score functions (force fields) mainly due to Thomas Huber at ANUSF and optionally blends these with experimental data. Sequence to structure alignments are calculated using a conventional dynamic programming algorithm. Most recently, the code has been persuaded to do parallel sequence to structure alignments and take advantage of the beowulf cluster at ANUSF.


D.J. Ayers, P.R. Gooley, A. Widmer-Cooper, A.E. Torda, Enhanced Protein Fold Recognition Using Secondary Structure Information From NMR, Protein Science, in press, 1999.

- Appendix A