Cargando…

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jayaraman, Gayathri, Siddharthan, Rahul
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949893/ https://www.ncbi.nlm.nih.gov/pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464

_version_	1782187605635366912
author	Jayaraman, Gayathri Siddharthan, Rahul
author_facet	Jayaraman, Gayathri Siddharthan, Rahul
author_sort	Jayaraman, Gayathri
collection	PubMed
description	BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.
format	Text
id	pubmed-2949893
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29498932010-11-03 Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model Jayaraman, Gayathri Siddharthan, Rahul BMC Bioinformatics Research Article BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction. BioMed Central 2010-09-16 /pmc/articles/PMC2949893/ /pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464 Text en Copyright ©2010 Jayaraman and Siddharthan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Jayaraman, Gayathri Siddharthan, Rahul Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title	Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_full	Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_fullStr	Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_full_unstemmed	Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_short	Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_sort	sigma-2: multiple sequence alignment of non-coding dna via an evolutionary model
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949893/ https://www.ncbi.nlm.nih.gov/pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464
work_keys_str_mv	AT jayaramangayathri sigma2multiplesequencealignmentofnoncodingdnaviaanevolutionarymodel AT siddharthanrahul sigma2multiplesequencealignmentofnoncodingdnaviaanevolutionarymodel

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

Ejemplares similares