NMR Spectral Assignment Using Structural Information


The composition (sequence) of a protein is easy to find experimentally, but not as exciting as its shape or arrangement of its atoms in space. This structural information is more difficult and expensive to find experimentally, but potentially more useful for designing drugs or understanding the atomic mechanism of proteins. We are active participants in the sport of trying to guess protein structure from sequence information. This alone would also lead to a dreary existence, unless one finds interesting applications for the techniques developed along the way.


Principal Investigator

Andrew Torda
Chemistry
RSC
ANU

Project

x08, v04, w51, d97

Facilities Used

PC, SC

Co-Investigators

Mark Abraham
Zsuzsanna Dosztanyi
Oliver Martin
James Procter
Regula Walser
Chemistry
RSC
ANU

RFCD Codes

250503, 250699, 230204, 239901


Significant Achievements, Anticipated Outcomes and Future Work

Some people think force fields are uninteresting sets of formulae and parameters which tell you about the energy of a system. In fact, force fields are fun to build, take apart and use in unusual ways. In previous years, we have spent much time automatically building score functions (like force fields) which tell us how much a protein sequence likes to sit on a particular structure. This is not fanciful. It lets us numerically test the compatibility of a new sequence with a known structure. While we continue the search for better score functions, we are now particularly interested in unexpected uses of these creations.

Firstly, score functions tell us about the interactions between different kinds of units within proteins. Some of these units (amino acids) behave in similar ways, while others are quite different. For example, some like to interact with water molecules, while others are happiest when hidden from the surface of a protein. These similarities tell us how compatible different residues are with each other. In other words, how likely one amino acid is to fit in the place of another during the course of evolution. This means we can build tables of compatibility based on our score functions. To a biologist, this would sound like the definition of an amino acid substitution matrix, a vital table used in all protein sequence comparisons. From our work, we can generate these tables without relying on an evolutionary model. Aside from the elegance of the method, our home-built substitution matrices have turned out to be useful for analysing the important properties of force fields.

Recognising protein structures also requires some classification of known structures. This is a nightmarishly difficult task since even similar proteins will differ in size and detail. There is no known, deterministic fail-proof approach to comparing protein structures, but we have been developing a graph theory based method. This works, but is somewhat slow. New approaches will use compatibility as calculated from our score functions.

Lastly, our score functions are a direct numerical way to fit protein sequences to protein structures. We are now working on approaches to classifying and predicting protein properties which will merge information from protein sequence comparisons, structure comparisons and sequence to structure comparisons using our score functions.

 

Computational Techniques Used

All code was home-built and lovingly crafted, except for the pieces which were recklessly slapped together. Across the different parts of the work, a smorgasbord of algorithms was used. This included various numerical optimisation methods for optimising force fields and parameters and a fast dynamic programming method for placing pieces of amino acid sequences on candidate structures. A selection of exotic clique-detection algorithms was used in the protein structure comparison work.

 

Publications, Awards and External Funding

Zs. Dosztányi, A.E. Torda, Amino acid similarity matrices based on force fields, Bioinformatics, 17, 2001, 686-699.

J.B. Procter, A.J. Perry, A.E. Torda, Comparing objects of different sizes: treating proteins as strings, Aust. J. Chem., 54, 2001, 367-373.

D. Reith, T. Huber, F. Müller-Plathe, A.E. Torda, Free energy approximations in simple lattice proteins, J. Chem. Phys. 114, 2001, 4998-5005.

A.J. Russell, A.E. Torda, Protein sequence threading - averaging over structures, Proteins, 2002, in press.