Cargando…
Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additiona...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355242/ https://www.ncbi.nlm.nih.gov/pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336 |
_version_ | 1783558235035271168 |
---|---|
author | Krieger, Spencer Kececioglu, John |
author_facet | Krieger, Spencer Kececioglu, John |
author_sort | Krieger, Spencer |
collection | PubMed |
description | MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2–10%, and Q(3) accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu. |
format | Online Article Text |
id | pubmed-7355242 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73552422020-07-16 Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization Krieger, Spencer Kececioglu, John Bioinformatics Macromolecular Sequence, Structure, and Function MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2–10%, and Q(3) accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355242/ /pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Macromolecular Sequence, Structure, and Function Krieger, Spencer Kececioglu, John Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
title | Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
title_full | Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
title_fullStr | Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
title_full_unstemmed | Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
title_short | Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
title_sort | boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization |
topic | Macromolecular Sequence, Structure, and Function |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355242/ https://www.ncbi.nlm.nih.gov/pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336 |
work_keys_str_mv | AT kriegerspencer boostingtheaccuracyofproteinsecondarystructurepredictionthroughnearestneighborsearchandmethodhybridization AT kececioglujohn boostingtheaccuracyofproteinsecondarystructurepredictionthroughnearestneighborsearchandmethodhybridization |