Cargando…

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that t...

Descripción completa

Detalles Bibliográficos
Autores principales: Jayaraman, Gayathri, Siddharthan, Rahul
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949893/
https://www.ncbi.nlm.nih.gov/pubmed/20846408
http://dx.doi.org/10.1186/1471-2105-11-464
_version_ 1782187605635366912
author Jayaraman, Gayathri
Siddharthan, Rahul
author_facet Jayaraman, Gayathri
Siddharthan, Rahul
author_sort Jayaraman, Gayathri
collection PubMed
description BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.
format Text
id pubmed-2949893
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29498932010-11-03 Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model Jayaraman, Gayathri Siddharthan, Rahul BMC Bioinformatics Research Article BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction. BioMed Central 2010-09-16 /pmc/articles/PMC2949893/ /pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464 Text en Copyright ©2010 Jayaraman and Siddharthan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jayaraman, Gayathri
Siddharthan, Rahul
Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_full Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_fullStr Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_full_unstemmed Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_short Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
title_sort sigma-2: multiple sequence alignment of non-coding dna via an evolutionary model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949893/
https://www.ncbi.nlm.nih.gov/pubmed/20846408
http://dx.doi.org/10.1186/1471-2105-11-464
work_keys_str_mv AT jayaramangayathri sigma2multiplesequencealignmentofnoncodingdnaviaanevolutionarymodel
AT siddharthanrahul sigma2multiplesequencealignmentofnoncodingdnaviaanevolutionarymodel