Cargando…
Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model
BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that t...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949893/ https://www.ncbi.nlm.nih.gov/pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464 |
_version_ | 1782187605635366912 |
---|---|
author | Jayaraman, Gayathri Siddharthan, Rahul |
author_facet | Jayaraman, Gayathri Siddharthan, Rahul |
author_sort | Jayaraman, Gayathri |
collection | PubMed |
description | BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction. |
format | Text |
id | pubmed-2949893 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29498932010-11-03 Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model Jayaraman, Gayathri Siddharthan, Rahul BMC Bioinformatics Research Article BACKGROUND: While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of homology and not similarity, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a p-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the p-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence. RESULTS: We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs. CONCLUSIONS: Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction. BioMed Central 2010-09-16 /pmc/articles/PMC2949893/ /pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464 Text en Copyright ©2010 Jayaraman and Siddharthan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Jayaraman, Gayathri Siddharthan, Rahul Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model |
title | Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model |
title_full | Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model |
title_fullStr | Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model |
title_full_unstemmed | Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model |
title_short | Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model |
title_sort | sigma-2: multiple sequence alignment of non-coding dna via an evolutionary model |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949893/ https://www.ncbi.nlm.nih.gov/pubmed/20846408 http://dx.doi.org/10.1186/1471-2105-11-464 |
work_keys_str_mv | AT jayaramangayathri sigma2multiplesequencealignmentofnoncodingdnaviaanevolutionarymodel AT siddharthanrahul sigma2multiplesequencealignmentofnoncodingdnaviaanevolutionarymodel |