Macromolecule folding dynamics is treated as a function approximation problem, a function that best describes the discrete pathways taken by the macromolecules from their unfolded (input) to their folded state (output). The “training” process makes the output converge towards a target value computed for the same spatial description using the same group of atoms of known structures. Target values are then used to alter the system parameter values in the direction favouring the native conformation in discrete steps. This process is repeated for each macromolecule chain and for all chains for the protein. The knowledge of the system parameters is stored. The parameter values are then applied to an unseen sequence in an unfolded randomly generated state.
The results can be visualized into an animated movie sequence taking it from a random configuration to correct folded configuration.
While the current progress is limited to smaller protein molecules, application to larger protein molecules can be achieved with potential improvements in "training methodologies"
The inventors intend to participate in CASP (CRitical Assessment of techniques for Protein Structure Prediction) competition.
Improvements in training methodology are being carried out through different input coding, learning systems and training regimes
Corporate sponsorship for participating in CASP competition by an "end-user champion" is desired
The machine learning approach is used for protein structure prediction. This represents a middle ground between ab initio and comparative modeling methods. Rather than explicitly searching for a matching structure to predict an unknown protein's configuration, it uses interpolation and extrapolation between the proteins of the training set to infer the appropriate dynamics without an explicit physics model.
The number of proteins for whcih sequences are known is about a million and half whereas the number of protein structures deposited in public dfatabases is leass than twenty thousand. The technology addresses the need to decrease the sequence-structure gap.