Cargando…

Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments

BACKGROUND: Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. RESULTS: Here we devel...

Descripción completa

Detalles Bibliográficos
Autores principales: Pollard, Daniel A, Moses, Alan M, Iyer, Venky N, Eisen, Michael B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1613255/
https://www.ncbi.nlm.nih.gov/pubmed/16904011
http://dx.doi.org/10.1186/1471-2105-7-376
_version_ 1782130481779703808
author Pollard, Daniel A
Moses, Alan M
Iyer, Venky N
Eisen, Michael B
author_facet Pollard, Daniel A
Moses, Alan M
Iyer, Venky N
Eisen, Michael B
author_sort Pollard, Daniel A
collection PubMed
description BACKGROUND: Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. RESULTS: Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. CONCLUSION: Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.
format Text
id pubmed-1613255
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16132552006-10-18 Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments Pollard, Daniel A Moses, Alan M Iyer, Venky N Eisen, Michael B BMC Bioinformatics Research Article BACKGROUND: Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. RESULTS: Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. CONCLUSION: Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors. BioMed Central 2006-08-14 /pmc/articles/PMC1613255/ /pubmed/16904011 http://dx.doi.org/10.1186/1471-2105-7-376 Text en Copyright © 2006 Pollard et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Pollard, Daniel A
Moses, Alan M
Iyer, Venky N
Eisen, Michael B
Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
title Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
title_full Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
title_fullStr Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
title_full_unstemmed Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
title_short Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
title_sort detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1613255/
https://www.ncbi.nlm.nih.gov/pubmed/16904011
http://dx.doi.org/10.1186/1471-2105-7-376
work_keys_str_mv AT pollarddaniela detectingthelimitsofregulatoryelementconservationanddivergenceestimationusingpairwiseandmultiplealignments
AT mosesalanm detectingthelimitsofregulatoryelementconservationanddivergenceestimationusingpairwiseandmultiplealignments
AT iyervenkyn detectingthelimitsofregulatoryelementconservationanddivergenceestimationusingpairwiseandmultiplealignments
AT eisenmichaelb detectingthelimitsofregulatoryelementconservationanddivergenceestimationusingpairwiseandmultiplealignments