Cargando…

Effect of the sequence data deluge on the performance of methods for detecting protein functional residues

BACKGROUND: The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those...

Descripción completa

Detalles Bibliográficos
Autores principales:	Garrido-Martín, Diego, Pazos, Florencio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827975/ https://www.ncbi.nlm.nih.gov/pubmed/29482506 http://dx.doi.org/10.1186/s12859-018-2084-7

_version_	1783302553542328320
author	Garrido-Martín, Diego Pazos, Florencio
author_facet	Garrido-Martín, Diego Pazos, Florencio
author_sort	Garrido-Martín, Diego
collection	PubMed
description	BACKGROUND: The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS: In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS: These results are informative for the methods’ developers and final users, and may have implications in the design of new sequencing initiatives.
format	Online Article Text
id	pubmed-5827975
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-58279752018-02-28 Effect of the sequence data deluge on the performance of methods for detecting protein functional residues Garrido-Martín, Diego Pazos, Florencio BMC Bioinformatics Research Article BACKGROUND: The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS: In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS: These results are informative for the methods’ developers and final users, and may have implications in the design of new sequencing initiatives. BioMed Central 2018-02-27 /pmc/articles/PMC5827975/ /pubmed/29482506 http://dx.doi.org/10.1186/s12859-018-2084-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Garrido-Martín, Diego Pazos, Florencio Effect of the sequence data deluge on the performance of methods for detecting protein functional residues
title	Effect of the sequence data deluge on the performance of methods for detecting protein functional residues
title_full	Effect of the sequence data deluge on the performance of methods for detecting protein functional residues
title_fullStr	Effect of the sequence data deluge on the performance of methods for detecting protein functional residues
title_full_unstemmed	Effect of the sequence data deluge on the performance of methods for detecting protein functional residues
title_short	Effect of the sequence data deluge on the performance of methods for detecting protein functional residues
title_sort	effect of the sequence data deluge on the performance of methods for detecting protein functional residues
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5827975/ https://www.ncbi.nlm.nih.gov/pubmed/29482506 http://dx.doi.org/10.1186/s12859-018-2084-7
work_keys_str_mv	AT garridomartindiego effectofthesequencedatadelugeontheperformanceofmethodsfordetectingproteinfunctionalresidues AT pazosflorencio effectofthesequencedatadelugeontheperformanceofmethodsfordetectingproteinfunctionalresidues

Effect of the sequence data deluge on the performance of methods for detecting protein functional residues

Ejemplares similares