Cargando…

Active machine learning for transmembrane helix prediction

BACKGROUND: About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty...

Descripción completa

Detalles Bibliográficos
Autores principales: Osmanbeyoglu, Hatice U, Wehner, Jessica A, Carbonell, Jaime G, Ganapathiraju, Madhavi K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009531/
https://www.ncbi.nlm.nih.gov/pubmed/20122233
http://dx.doi.org/10.1186/1471-2105-11-S1-S58
_version_ 1782194700403343360
author Osmanbeyoglu, Hatice U
Wehner, Jessica A
Carbonell, Jaime G
Ganapathiraju, Madhavi K
author_facet Osmanbeyoglu, Hatice U
Wehner, Jessica A
Carbonell, Jaime G
Ganapathiraju, Madhavi K
author_sort Osmanbeyoglu, Hatice U
collection PubMed
description BACKGROUND: About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. RESULTS: An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning. We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. CONCLUSION: Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments.
format Text
id pubmed-3009531
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30095312010-12-23 Active machine learning for transmembrane helix prediction Osmanbeyoglu, Hatice U Wehner, Jessica A Carbonell, Jaime G Ganapathiraju, Madhavi K BMC Bioinformatics Research BACKGROUND: About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. RESULTS: An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning. We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. CONCLUSION: Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments. BioMed Central 2010-01-18 /pmc/articles/PMC3009531/ /pubmed/20122233 http://dx.doi.org/10.1186/1471-2105-11-S1-S58 Text en Copyright ©2010 Osmanbeyoglu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Osmanbeyoglu, Hatice U
Wehner, Jessica A
Carbonell, Jaime G
Ganapathiraju, Madhavi K
Active machine learning for transmembrane helix prediction
title Active machine learning for transmembrane helix prediction
title_full Active machine learning for transmembrane helix prediction
title_fullStr Active machine learning for transmembrane helix prediction
title_full_unstemmed Active machine learning for transmembrane helix prediction
title_short Active machine learning for transmembrane helix prediction
title_sort active machine learning for transmembrane helix prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009531/
https://www.ncbi.nlm.nih.gov/pubmed/20122233
http://dx.doi.org/10.1186/1471-2105-11-S1-S58
work_keys_str_mv AT osmanbeyogluhaticeu activemachinelearningfortransmembranehelixprediction
AT wehnerjessicaa activemachinelearningfortransmembranehelixprediction
AT carbonelljaimeg activemachinelearningfortransmembranehelixprediction
AT ganapathirajumadhavik activemachinelearningfortransmembranehelixprediction