Force Fields For Protein Fold Recognition


Principal Investigator

Andrew Torda

Research School of Chemistry

The composition of a protein (its amino acid sequence) is relatively easy to obtain
experimentally. One would, however, like to know the shape of a protein if it is going to be used for drug design or industrial applications. One of the holy grails of computational chemistry is to be able to predict a protein's structure given only its sequence.
Surprisingly, more than 2/3 of new protein structures appear to be very similar to ones that are already known. Our structure prediction approach is based on trying to build functions which take an amino acid sequence and score the match to every currently known protein structure. The functions are mathematically like force fields from molecular mechanics, but they have no obvious physical basis and are philosophically closer to methods from pattern matching.



Thomas Huber

Research School of Chemistry



v04 - PC



What are the results to date and the future of the work?

Computationally, there are two aspects to be developed. Firstly, one needs functions to score protein sequences with structures. Secondly, there are combinatorial problems in placing the protein building blocks (amino acids) on candidate protein structures.

The first set of force fields/score functions has been built and show an astounding ability to recognise correct protein structures. Most of our recent effort has gone into breaking these functions and finding their weaknesses.

The combinatorial problems (aligning amino acids on candidate proteins structures) are NP-complete, so we have built special scoring functions with approximations to make the problems tractable. While the numerical gymnastics here are appealing, this appears to be the greatest weakness with our current approach.

Our plans are now to stop tinkering with the score functions used to rank protein folds, but to construct completely new methodology for optimising score functions used in the combinatorial problems.

What computational techniques are used?

We use home-built codes for the numerical optimisation which produces the score functions. Our main application code is another home-built pieice of more than 13000 lines of C, linked

- Appendix A


to Tcl so as to provide an interpreter ( Documentation is at In the heart of this package lie the functions for scoring protein sequences with candidate structures and methods such as a dynamic programming algorithm and Monte Carlo/simulated annealing for tackling the combinatorial problems.


T. Huber and A. E. Torda, Protein Fold Recognition without Boltzmann Statistics or Explicit Physical Basis, Protein Science, 7, 1998, 142-149.

Appendix A-