Predicting the local structure of proteins with Hidden Markov models
Abstract
De novo methods for protein structure prediction aim at providing prediction for hard cases. Those methods usually require local structure prediction as a first step. The local structure of a protein can be described using the concepts of secondary structures such as helices, sheets and coils. We propose a method based on hidden Markov models to perform the local structure prediction. Hidden Markov models offer a strong theoretical background. Moreover, they allow an explicit data modeling.
We first develop a new method for protein secondary structure assignment. We investigate the use of hidden Markov models with only three hidden states and various memory schemes for protein secondary structure prediction. Then we increase the number of hidden states using two strategies : a model design based on previous knowledge about secondary structure, or the choice of the optimal model using statistical and accuracy criteria. As secondary structure prediction provides no clue about the structure of coil regions, we include geometrical descriptors (dihedral angle zones) of the coil in our models. We finally investigate several means to include the homologous sequence information to improve the prediction by hidden Markov models.
Manuscript
The thesis manuscript (in french) can be downloaded here (8.5Mo).