Cargando…

Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity

Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Haubold, Bernhard, Pfaffelhuber, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3411244/
https://www.ncbi.nlm.nih.gov/pubmed/22908037
http://dx.doi.org/10.1534/g3.112.002527
_version_ 1782239801067438080
author Haubold, Bernhard
Pfaffelhuber, Peter
author_facet Haubold, Bernhard
Pfaffelhuber, Peter
author_sort Haubold, Bernhard
collection PubMed
description Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination (ρ ≤ π). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3.
format Online
Article
Text
id pubmed-3411244
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-34112442012-08-20 Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity Haubold, Bernhard Pfaffelhuber, Peter G3 (Bethesda) Investigations Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination (ρ ≤ π). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3. Genetics Society of America 2012-08-01 /pmc/articles/PMC3411244/ /pubmed/22908037 http://dx.doi.org/10.1534/g3.112.002527 Text en Copyright © 2012 Haubold, Pfaffelhuber http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Haubold, Bernhard
Pfaffelhuber, Peter
Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
title Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
title_full Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
title_fullStr Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
title_full_unstemmed Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
title_short Alignment-Free Population Genomics: An Efficient Estimator of Sequence Diversity
title_sort alignment-free population genomics: an efficient estimator of sequence diversity
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3411244/
https://www.ncbi.nlm.nih.gov/pubmed/22908037
http://dx.doi.org/10.1534/g3.112.002527
work_keys_str_mv AT hauboldbernhard alignmentfreepopulationgenomicsanefficientestimatorofsequencediversity
AT pfaffelhuberpeter alignmentfreepopulationgenomicsanefficientestimatorofsequencediversity