Cargando…
Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes ba...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3411244/ https://www.ncbi.nlm.nih.gov/pubmed/22908037 http://dx.doi.org/10.1534/g3.112.002527 |
_version_ | 1782239801067438080 |
---|---|
author | Haubold, Bernhard Pfaffelhuber, Peter |
author_facet | Haubold, Bernhard Pfaffelhuber, Peter |
author_sort | Haubold, Bernhard |
collection | PubMed |
description | Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination (ρ ≤ π). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3. |
format | Online Article Text |
id | pubmed-3411244 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-34112442012-08-20 Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity Haubold, Bernhard Pfaffelhuber, Peter G3 (Bethesda) Investigations Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination (ρ ≤ π). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3. Genetics Society of America 2012-08-01 /pmc/articles/PMC3411244/ /pubmed/22908037 http://dx.doi.org/10.1534/g3.112.002527 Text en Copyright © 2012 Haubold, Pfaffelhuber http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Haubold, Bernhard Pfaffelhuber, Peter Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity |
title | Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity |
title_full | Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity |
title_fullStr | Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity |
title_full_unstemmed | Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity |
title_short | Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity |
title_sort | alignment-free population genomics: an efficient estimator of sequence diversity |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3411244/ https://www.ncbi.nlm.nih.gov/pubmed/22908037 http://dx.doi.org/10.1534/g3.112.002527 |
work_keys_str_mv | AT hauboldbernhard alignmentfreepopulationgenomicsanefficientestimatorofsequencediversity AT pfaffelhuberpeter alignmentfreepopulationgenomicsanefficientestimatorofsequencediversity |