Machine Learning System for Protein Folding Dynamics and Structure Determination

Summary

 Quick description:  A machine learning system is developed to induce the folding dynamics of proteins (macromolecules) of known sequences and structures (training set). These dynamics are then applied to other proteins whose structures are unknown (test set). Thus, the problem of identifying the fold dynamics is treated as a function approximation problem, where the function to be learned is the update equation of a dynamical system
 Posted by:  University of Guelph
 Published:  23 July 2009
 Patent:  Pending
 Project Type:  Out-Licensing Opportunity
 Primary sector:  Health and Life Sciences
 Seeking / Offering:  Collaboration or Partnership, Exclusive Licensing
 Areas of interest:  bio-pharmaceuticals, bioinformatics, genomics, health & life sciences, lifesciences, information technology, r&d discovery, science & technology


Background

The machine learning approach is used for protein structure prediction. This represents a middle ground between ab initio and comparative modeling methods. Rather than explicitly searching for a matching structure to predict an unknown protein's configuration, it uses interpolation and extrapolation between the proteins of the training set to infer the appropriate dynamics without an explicit physics model.

Description

Macromolecule folding dynamics is treated as a function approximation problem, a function that best describes the discrete pathways taken by the macromolecules from their unfolded (input) to their folded state (output). The “training” process makes the output converge towards a target value computed for the same spatial description using the same group of atoms of known structures. Target values are then used to alter the system parameter values in the direction favouring the native conformation in discrete steps. This process is repeated for each macromolecule chain and for all chains for the protein. The knowledge of the system parameters is stored. The parameter values are then applied to an unseen sequence in an unfolded randomly generated state.

The results can be visualized into an animated movie sequence taking it from a random configuration to correct folded configuration.

Advantages

  • Treats protein structure as a result of the dynamic folding process
  • Does not require explicit structure matches
  • Can handle proteins that may not adopt minimal energy configurations
  • Ability to interpolate and extrapolate between sequence structures
  • Applicable to macromolecules other than proteins

Potential Applications

  • Drug discovery research
  • Bio Informatics

Limitations

While the current progress is limited to smaller protein molecules, application to larger protein molecules can be achieved with potential improvements in "training methodologies"

State of Development

The inventors intend to participate in CASP (CRitical Assessment of techniques for Protein Structure Prediction) competition.

Improvements in training methodology are being carried out through different input coding, learning systems and training regimes

Opportunity

The number of proteins for whcih sequences are known is about a million and half whereas the number of protein structures deposited in public dfatabases is leass than twenty thousand. The technology addresses the need to decrease the sequence-structure gap.

Additional Information

Corporate sponsorship for participating in CASP competition by an "end-user champion" is desired

 

 

Stay Informed

Learn More

I want to learn more about this research project.

General Enquiries
University of Guelph
Guelph, Canada

E: send enquiry

Manager
Haridoss Sarma
Business Development Office
Guelph, Canada

E: send enquiry

Copyright © 2010
University of Guelph