Cargando…

Automatic detection of anchor points for multiple sequence alignment

BACKGROUND: Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. M...

Descripción completa

Detalles Bibliográficos
Autores principales: Pitschi, Florian, Devauchelle, Claudine, Corel, Eduardo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2942857/
https://www.ncbi.nlm.nih.gov/pubmed/20813050
http://dx.doi.org/10.1186/1471-2105-11-445
_version_ 1782186973894541312
author Pitschi, Florian
Devauchelle, Claudine
Corel, Eduardo
author_facet Pitschi, Florian
Devauchelle, Claudine
Corel, Eduardo
author_sort Pitschi, Florian
collection PubMed
description BACKGROUND: Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. RESULTS: We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. CONCLUSIONS: We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.
format Text
id pubmed-2942857
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29428572010-10-01 Automatic detection of anchor points for multiple sequence alignment Pitschi, Florian Devauchelle, Claudine Corel, Eduardo BMC Bioinformatics Research Article BACKGROUND: Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. RESULTS: We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. CONCLUSIONS: We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3. BioMed Central 2010-09-02 /pmc/articles/PMC2942857/ /pubmed/20813050 http://dx.doi.org/10.1186/1471-2105-11-445 Text en Copyright ©2010 Pitschi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pitschi, Florian
Devauchelle, Claudine
Corel, Eduardo
Automatic detection of anchor points for multiple sequence alignment
title Automatic detection of anchor points for multiple sequence alignment
title_full Automatic detection of anchor points for multiple sequence alignment
title_fullStr Automatic detection of anchor points for multiple sequence alignment
title_full_unstemmed Automatic detection of anchor points for multiple sequence alignment
title_short Automatic detection of anchor points for multiple sequence alignment
title_sort automatic detection of anchor points for multiple sequence alignment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2942857/
https://www.ncbi.nlm.nih.gov/pubmed/20813050
http://dx.doi.org/10.1186/1471-2105-11-445
work_keys_str_mv AT pitschiflorian automaticdetectionofanchorpointsformultiplesequencealignment
AT devauchelleclaudine automaticdetectionofanchorpointsformultiplesequencealignment
AT coreleduardo automaticdetectionofanchorpointsformultiplesequencealignment