Cargando…

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additiona...

Descripción completa

Detalles Bibliográficos
Autores principales:	Krieger, Spencer, Kececioglu, John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Macromolecular Sequence, Structure, and Function
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355242/ https://www.ncbi.nlm.nih.gov/pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336

_version_	1783558235035271168
author	Krieger, Spencer Kececioglu, John
author_facet	Krieger, Spencer Kececioglu, John
author_sort	Krieger, Spencer
collection	PubMed
description	MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2–10%, and Q(3) accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.
format	Online Article Text
id	pubmed-7355242
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73552422020-07-16 Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization Krieger, Spencer Kececioglu, John Bioinformatics Macromolecular Sequence, Structure, and Function MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q(8) accuracy by more than 2–10%, and Q(3) accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355242/ /pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Macromolecular Sequence, Structure, and Function Krieger, Spencer Kececioglu, John Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title	Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_full	Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_fullStr	Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_full_unstemmed	Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_short	Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
title_sort	boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization
topic	Macromolecular Sequence, Structure, and Function
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355242/ https://www.ncbi.nlm.nih.gov/pubmed/32657384 http://dx.doi.org/10.1093/bioinformatics/btaa336
work_keys_str_mv	AT kriegerspencer boostingtheaccuracyofproteinsecondarystructurepredictionthroughnearestneighborsearchandmethodhybridization AT kececioglujohn boostingtheaccuracyofproteinsecondarystructurepredictionthroughnearestneighborsearchandmethodhybridization

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

Ejemplares similares