Cargando…

Estimating statistical significance of local protein profile-profile alignments

BACKGROUND: Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistica...

Descripción completa

Detalles Bibliográficos
Autor principal: Margelevičius, Mindaugas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693267/
https://www.ncbi.nlm.nih.gov/pubmed/31409275
http://dx.doi.org/10.1186/s12859-019-2913-3
_version_ 1783443680220151808
author Margelevičius, Mindaugas
author_facet Margelevičius, Mindaugas
author_sort Margelevičius, Mindaugas
collection PubMed
description BACKGROUND: Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. RESULTS: This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. CONCLUSIONS: The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2913-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6693267
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66932672019-08-19 Estimating statistical significance of local protein profile-profile alignments Margelevičius, Mindaugas BMC Bioinformatics Methodology Article BACKGROUND: Alignment of sequence families described by profiles provides a sensitive means for establishing homology between proteins and is important in protein evolutionary, structural, and functional studies. In the context of a steadily growing amount of sequence data, estimating the statistical significance of alignments, including profile-profile alignments, plays a key role in alignment-based homology search algorithms. Still, it is an open question as to what and whether one type of distribution governs profile-profile alignment score, especially when profile-profile substitution scores involve such terms as secondary structure predictions. RESULTS: This study presents a methodology for estimating the statistical significance of this type of alignments. The methodology rests on a new algorithm developed for generating random profiles such that their alignment scores are distributed similarly to those obtained for real unrelated profiles. We show that improvements in statistical accuracy and sensitivity and high-quality alignment rate result from statistically characterizing alignments by establishing the dependence of statistical parameters on various measures associated with both individual and pairwise profile characteristics. Implemented in the COMER software, the proposed methodology yielded an increase of up to 34.2% in the number of true positives and up to 61.8% in the number of high-quality alignments with respect to the previous version of the COMER method. CONCLUSIONS: The more accurate estimation of statistical significance is implemented in the COMER method, which is now more sensitive and provides an increased rate of high-quality profile-profile alignments. The results of the present study also suggest directions for future research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2913-3) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-13 /pmc/articles/PMC6693267/ /pubmed/31409275 http://dx.doi.org/10.1186/s12859-019-2913-3 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Margelevičius, Mindaugas
Estimating statistical significance of local protein profile-profile alignments
title Estimating statistical significance of local protein profile-profile alignments
title_full Estimating statistical significance of local protein profile-profile alignments
title_fullStr Estimating statistical significance of local protein profile-profile alignments
title_full_unstemmed Estimating statistical significance of local protein profile-profile alignments
title_short Estimating statistical significance of local protein profile-profile alignments
title_sort estimating statistical significance of local protein profile-profile alignments
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693267/
https://www.ncbi.nlm.nih.gov/pubmed/31409275
http://dx.doi.org/10.1186/s12859-019-2913-3
work_keys_str_mv AT margeleviciusmindaugas estimatingstatisticalsignificanceoflocalproteinprofileprofilealignments