Cargando…

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

BACKGROUND: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specif...

Descripción completa

Detalles Bibliográficos
Autor principal:	Siddharthan, Rahul
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1468434/ https://www.ncbi.nlm.nih.gov/pubmed/16542424 http://dx.doi.org/10.1186/1471-2105-7-143

_version_	1782127564158926848
author	Siddharthan, Rahul
author_facet	Siddharthan, Rahul
author_sort	Siddharthan, Rahul
collection	PubMed
description	BACKGROUND: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. RESULTS: Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. CONCLUSION: By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.
format	Text
id	pubmed-1468434
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14684342006-06-07 Sigma: multiple alignment of weakly-conserved non-coding DNA sequence Siddharthan, Rahul BMC Bioinformatics Software BACKGROUND: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. RESULTS: Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. CONCLUSION: By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics. BioMed Central 2006-03-16 /pmc/articles/PMC1468434/ /pubmed/16542424 http://dx.doi.org/10.1186/1471-2105-7-143 Text en Copyright © 2006 Siddharthan; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Siddharthan, Rahul Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
title	Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
title_full	Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
title_fullStr	Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
title_full_unstemmed	Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
title_short	Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
title_sort	sigma: multiple alignment of weakly-conserved non-coding dna sequence
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1468434/ https://www.ncbi.nlm.nih.gov/pubmed/16542424 http://dx.doi.org/10.1186/1471-2105-7-143
work_keys_str_mv	AT siddharthanrahul sigmamultiplealignmentofweaklyconservednoncodingdnasequence

Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

Ejemplares similares