Cargando…

Splign: algorithms for computing spliced alignments with identification of paralogs

BACKGROUND: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficult...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kapustin, Yuri, Souvorov, Alexander, Tatusova, Tatiana, Lipman, David
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440734/ https://www.ncbi.nlm.nih.gov/pubmed/18495041 http://dx.doi.org/10.1186/1745-6150-3-20

_version_	1782156566670082048
author	Kapustin, Yuri Souvorov, Alexander Tatusova, Tatiana Lipman, David
author_facet	Kapustin, Yuri Souvorov, Alexander Tatusova, Tatiana Lipman, David
author_sort	Kapustin, Yuri
collection	PubMed
description	BACKGROUND: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. RESULTS: We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. CONCLUSION: Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. REVIEWERS: This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand).
format	Text
id	pubmed-2440734
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24407342008-06-27 Splign: algorithms for computing spliced alignments with identification of paralogs Kapustin, Yuri Souvorov, Alexander Tatusova, Tatiana Lipman, David Biol Direct Research BACKGROUND: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. RESULTS: We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. CONCLUSION: Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. REVIEWERS: This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand). BioMed Central 2008-05-21 /pmc/articles/PMC2440734/ /pubmed/18495041 http://dx.doi.org/10.1186/1745-6150-3-20 Text en Copyright © 2008 Kapustin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Kapustin, Yuri Souvorov, Alexander Tatusova, Tatiana Lipman, David Splign: algorithms for computing spliced alignments with identification of paralogs
title	Splign: algorithms for computing spliced alignments with identification of paralogs
title_full	Splign: algorithms for computing spliced alignments with identification of paralogs
title_fullStr	Splign: algorithms for computing spliced alignments with identification of paralogs
title_full_unstemmed	Splign: algorithms for computing spliced alignments with identification of paralogs
title_short	Splign: algorithms for computing spliced alignments with identification of paralogs
title_sort	splign: algorithms for computing spliced alignments with identification of paralogs
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440734/ https://www.ncbi.nlm.nih.gov/pubmed/18495041 http://dx.doi.org/10.1186/1745-6150-3-20
work_keys_str_mv	AT kapustinyuri splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs AT souvorovalexander splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs AT tatusovatatiana splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs AT lipmandavid splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs

Splign: algorithms for computing spliced alignments with identification of paralogs

Ejemplares similares