Cargando…

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additiona...

Descripción completa

Detalles Bibliográficos
Autores principales: Krieger, Spencer, Kececioglu, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355242/
https://www.ncbi.nlm.nih.gov/pubmed/32657384
http://dx.doi.org/10.1093/bioinformatics/btaa336
_version_ 1783558235035271168
author Krieger, Spencer
Kececioglu, John
author_facet Krieger, Spencer
Kececioglu, John
author_sort Krieger, Spencer
collection PubMed
description MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2–10%, and Q(3) accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.
format Online
Article
Text
id pubmed-7355242
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73552422020-07-16 Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization Krieger, Spencer Kececioglu, John Bioinformatics Macromolecular Sequence, Structure, and Function MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2–10%, and Q(3) accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355242/ /pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Macromolecular Sequence, Structure, and Function
Krieger, Spencer
Kececioglu, John
Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_full Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_fullStr Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_full_unstemmed Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_short Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_sort boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
topic Macromolecular Sequence, Structure, and Function
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355242/
https://www.ncbi.nlm.nih.gov/pubmed/32657384
http://dx.doi.org/10.1093/bioinformatics/btaa336
work_keys_str_mv AT kriegerspencer boostingtheaccuracyofproteinsecondarystructurepredictionthroughnearestneighborsearchandmethodhybridization
AT kececioglujohn boostingtheaccuracyofproteinsecondarystructurepredictionthroughnearestneighborsearchandmethodhybridization