Cargando…

Splign: algorithms for computing spliced alignments with identification of paralogs

BACKGROUND: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficult...

Descripción completa

Detalles Bibliográficos
Autores principales: Kapustin, Yuri, Souvorov, Alexander, Tatusova, Tatiana, Lipman, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440734/
https://www.ncbi.nlm.nih.gov/pubmed/18495041
http://dx.doi.org/10.1186/1745-6150-3-20
_version_ 1782156566670082048
author Kapustin, Yuri
Souvorov, Alexander
Tatusova, Tatiana
Lipman, David
author_facet Kapustin, Yuri
Souvorov, Alexander
Tatusova, Tatiana
Lipman, David
author_sort Kapustin, Yuri
collection PubMed
description BACKGROUND: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. RESULTS: We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. CONCLUSION: Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. REVIEWERS: This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand).
format Text
id pubmed-2440734
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24407342008-06-27 Splign: algorithms for computing spliced alignments with identification of paralogs Kapustin, Yuri Souvorov, Alexander Tatusova, Tatiana Lipman, David Biol Direct Research BACKGROUND: The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. RESULTS: We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. CONCLUSION: Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. REVIEWERS: This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand). BioMed Central 2008-05-21 /pmc/articles/PMC2440734/ /pubmed/18495041 http://dx.doi.org/10.1186/1745-6150-3-20 Text en Copyright © 2008 Kapustin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Kapustin, Yuri
Souvorov, Alexander
Tatusova, Tatiana
Lipman, David
Splign: algorithms for computing spliced alignments with identification of paralogs
title Splign: algorithms for computing spliced alignments with identification of paralogs
title_full Splign: algorithms for computing spliced alignments with identification of paralogs
title_fullStr Splign: algorithms for computing spliced alignments with identification of paralogs
title_full_unstemmed Splign: algorithms for computing spliced alignments with identification of paralogs
title_short Splign: algorithms for computing spliced alignments with identification of paralogs
title_sort splign: algorithms for computing spliced alignments with identification of paralogs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440734/
https://www.ncbi.nlm.nih.gov/pubmed/18495041
http://dx.doi.org/10.1186/1745-6150-3-20
work_keys_str_mv AT kapustinyuri splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs
AT souvorovalexander splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs
AT tatusovatatiana splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs
AT lipmandavid splignalgorithmsforcomputingsplicedalignmentswithidentificationofparalogs