Force Fields For Protein Fold Recognition

                 

Principal Investigator

Andrew Torda

Research School of Chemistry

 

This project is geared to tackle a subset of the protein structure problem. That is, given the primary sequence of a protein, can one predict its 3-dimensional structure?

Usually, when a new protein structure is determined experimentally, it turns out to adopt the same fold as a previously known structure. This is the case more than 90% of the time, even when there is no statistically meaningful sequence homology. This means that it would be remarkably powerful if one could merely recognise the most appropriate fold for a given sequence.

There are two aspects to this work. The first is the development of a scoring or pseudo-energy function for ranking candidate structures. The second is a combinatorial problem. Given a sequence and candidate protein fold/template, how can one find the optimal location for the protein sequence of interest on the protein template?

 

Co-Investigators

   

Tian-Xiong Lu

T. Huber

Research School of Chemistry

   

Projects

v04 - PC

   
           
                 

   
                 

What are the results to date and the future of the work?

Most progress has been made on a pseudo-energy or scoring function. This has been done by defining a set of functional forms for interactions between particles and specifying a set of parameters. The goal of the force field (recognising the correct fold for a protein sequence) has been cast in the form of a target function. Originally, this function was optimised by attaching fictitious masses to the parameters and performing quasi-Newtonian dynamics in parameter space. Most recently, we have moved to simpler force field forms which seem immune to multiple minima problems and allow the optimal force field to be found with simple minimisation methods. While this is conceptually simple, problems have arisen due to the sheer size of the problem. The target function used to measure the quality of the force field depends on nearly 400 protein structures and more than 10 million misfolded structures. Each of these structures contains between a few hundred and a few thousand interacting particles.

There are several future directions for the work. Firstly, the force field is the best possible within our framework. Unfortunately, there is evidence that we need a better framework. Secondly, we continue to work on methods for the best alignment of a sequence to a candidate structure. We have begun using some force field approximations which allow a classic dynamic programming algorithm to find a very good alignment. This can then be optimised by methods such as Monte Carlo/simulated annealing.

                 

                 

     

What computational techniques are used?

From an algorithmic point of view, the project has its soul in statistical mechanics and simple optimisation.

From the implementation point of view, we have used home-built code for force field development and testing.

This is constructed as a Tcl extension so the computationally intensive work is done in C, but one has the flexibility of a high level interpreter.

The package does force field scoring, alignments using a Needleman and Wunsch algorithm or Monte Carlo/simulated annealing.

Publications

Ulrich, P., Scott, W., van Gunsteren, W.F. and Torda, A.E., Protein structure prediction force fields: Parametrization with quasi-Newtonian dynamics. Proteins, in press .

Huber, T. and Torda, A.E. J. Mol. Biol, A Protein fold recognition force field without Boltzman statistics or explicit physical basis. submitted .

     

- Appendix A