Cargando…
MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials
BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models ar...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336608/ https://www.ncbi.nlm.nih.gov/pubmed/32631256 http://dx.doi.org/10.1186/s12859-020-3383-3 |
_version_ | 1783554350560313344 |
---|---|
author | Liu, Tong Wang, Zheng |
author_facet | Liu, Tong Wang, Zheng |
author_sort | Liu, Tong |
collection | PubMed |
description | BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS: We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS: MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/. |
format | Online Article Text |
id | pubmed-7336608 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73366082020-07-08 MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials Liu, Tong Wang, Zheng BMC Bioinformatics Research BACKGROUND: Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS: We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS: MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/. BioMed Central 2020-07-06 /pmc/articles/PMC7336608/ /pubmed/32631256 http://dx.doi.org/10.1186/s12859-020-3383-3 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Liu, Tong Wang, Zheng MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials |
title | MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials |
title_full | MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials |
title_fullStr | MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials |
title_full_unstemmed | MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials |
title_short | MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials |
title_sort | mass: predict the global qualities of individual protein models using random forests and novel statistical potentials |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336608/ https://www.ncbi.nlm.nih.gov/pubmed/32631256 http://dx.doi.org/10.1186/s12859-020-3383-3 |
work_keys_str_mv | AT liutong masspredicttheglobalqualitiesofindividualproteinmodelsusingrandomforestsandnovelstatisticalpotentials AT wangzheng masspredicttheglobalqualitiesofindividualproteinmodelsusingrandomforestsandnovelstatisticalpotentials |