Cargando…

DivA: detection of non-homologous and very divergent regions in protein sequence alignments

BACKGROUND: Sequence alignments are used to find evidence of homology but sometimes contain regions that are difficult to align which can interfere with the quality of the subsequent analyses. Although it is possible to remove problematic regions manually, this is non-practical in large genome scale...

Descripción completa

Detalles Bibliográficos
Autores principales: Zepeda Mendoza, Marie Lisandra, Nygaard, Sanne, da Fonseca, Rute R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240845/
https://www.ncbi.nlm.nih.gov/pubmed/25403086
http://dx.doi.org/10.1186/1756-0500-7-806
_version_ 1782345781677654016
author Zepeda Mendoza, Marie Lisandra
Nygaard, Sanne
da Fonseca, Rute R
author_facet Zepeda Mendoza, Marie Lisandra
Nygaard, Sanne
da Fonseca, Rute R
author_sort Zepeda Mendoza, Marie Lisandra
collection PubMed
description BACKGROUND: Sequence alignments are used to find evidence of homology but sometimes contain regions that are difficult to align which can interfere with the quality of the subsequent analyses. Although it is possible to remove problematic regions manually, this is non-practical in large genome scale studies, and the results suffer from irreproducibility arising from subjectivity. Some automated alignment trimming methods have been developed to remove problematic regions in alignments but these mostly act by removing complete columns or complete sequences from the MSA, discarding a lot of informative sites. FINDINGS: Here we present a tool that identifies Divergent windows in protein sequence Alignments (DivA). DivA makes no assumptions on evolutionary models, and it is ideal for detecting incorrectly annotated segments within individual gene sequences. DivA works with a sliding-window approach to estimate four divergence-based parameters and their outlier values. It then classifies a window of a sequence of an alignment as very divergent (potentially non-homologous) if it presents a combination of outlier values for the four parameters it calculates. The windows classified as very divergent can optionally be masked in the alignment. CONCLUSIONS: DivA automatically identifies very divergent and incorrectly annotated genic regions in MSAs avoiding the subjective and time-consuming problem of manual annotation. The output is clear to interpret and allows the user to take more informed decisions for reducing the amount of sequence discarded but still finding the potentially erroneous and non-homologous regions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-806) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4240845
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42408452014-11-23 DivA: detection of non-homologous and very divergent regions in protein sequence alignments Zepeda Mendoza, Marie Lisandra Nygaard, Sanne da Fonseca, Rute R BMC Res Notes Technical Note BACKGROUND: Sequence alignments are used to find evidence of homology but sometimes contain regions that are difficult to align which can interfere with the quality of the subsequent analyses. Although it is possible to remove problematic regions manually, this is non-practical in large genome scale studies, and the results suffer from irreproducibility arising from subjectivity. Some automated alignment trimming methods have been developed to remove problematic regions in alignments but these mostly act by removing complete columns or complete sequences from the MSA, discarding a lot of informative sites. FINDINGS: Here we present a tool that identifies Divergent windows in protein sequence Alignments (DivA). DivA makes no assumptions on evolutionary models, and it is ideal for detecting incorrectly annotated segments within individual gene sequences. DivA works with a sliding-window approach to estimate four divergence-based parameters and their outlier values. It then classifies a window of a sequence of an alignment as very divergent (potentially non-homologous) if it presents a combination of outlier values for the four parameters it calculates. The windows classified as very divergent can optionally be masked in the alignment. CONCLUSIONS: DivA automatically identifies very divergent and incorrectly annotated genic regions in MSAs avoiding the subjective and time-consuming problem of manual annotation. The output is clear to interpret and allows the user to take more informed decisions for reducing the amount of sequence discarded but still finding the potentially erroneous and non-homologous regions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-806) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-18 /pmc/articles/PMC4240845/ /pubmed/25403086 http://dx.doi.org/10.1186/1756-0500-7-806 Text en © Zepeda Mendoza et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Zepeda Mendoza, Marie Lisandra
Nygaard, Sanne
da Fonseca, Rute R
DivA: detection of non-homologous and very divergent regions in protein sequence alignments
title DivA: detection of non-homologous and very divergent regions in protein sequence alignments
title_full DivA: detection of non-homologous and very divergent regions in protein sequence alignments
title_fullStr DivA: detection of non-homologous and very divergent regions in protein sequence alignments
title_full_unstemmed DivA: detection of non-homologous and very divergent regions in protein sequence alignments
title_short DivA: detection of non-homologous and very divergent regions in protein sequence alignments
title_sort diva: detection of non-homologous and very divergent regions in protein sequence alignments
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4240845/
https://www.ncbi.nlm.nih.gov/pubmed/25403086
http://dx.doi.org/10.1186/1756-0500-7-806
work_keys_str_mv AT zepedamendozamarielisandra divadetectionofnonhomologousandverydivergentregionsinproteinsequencealignments
AT nygaardsanne divadetectionofnonhomologousandverydivergentregionsinproteinsequencealignments
AT dafonsecaruter divadetectionofnonhomologousandverydivergentregionsinproteinsequencealignments