Cargando…

MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials

BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Tong, Wang, Zheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336608/
https://www.ncbi.nlm.nih.gov/pubmed/32631256
http://dx.doi.org/10.1186/s12859-020-3383-3
_version_ 1783554350560313344
author Liu, Tong
Wang, Zheng
author_facet Liu, Tong
Wang, Zheng
author_sort Liu, Tong
collection PubMed
description BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS: We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS: MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/.
format Online
Article
Text
id pubmed-7336608
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73366082020-07-08 MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials Liu, Tong Wang, Zheng BMC Bioinformatics Research BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS: We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS: MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/. BioMed Central 2020-07-06 /pmc/articles/PMC7336608/ /pubmed/32631256 http://dx.doi.org/10.1186/s12859-020-3383-3 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Tong
Wang, Zheng
MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
title MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
title_full MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
title_fullStr MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
title_full_unstemmed MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
title_short MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
title_sort mass: predict the global qualities of individual protein models using random forests and novel statistical potentials
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336608/
https://www.ncbi.nlm.nih.gov/pubmed/32631256
http://dx.doi.org/10.1186/s12859-020-3383-3
work_keys_str_mv AT liutong masspredicttheglobalqualitiesofindividualproteinmodelsusingrandomforestsandnovelstatisticalpotentials
AT wangzheng masspredicttheglobalqualitiesofindividualproteinmodelsusingrandomforestsandnovelstatisticalpotentials