Cargando…

Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using...

Descripción completa

Detalles Bibliográficos
Autores principales: MacLeod, Iona M., Larkin, Denis M., Lewin, Harris A., Hayes, Ben J., Goddard, Mike E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3748359/
https://www.ncbi.nlm.nih.gov/pubmed/23842528
http://dx.doi.org/10.1093/molbev/mst125
_version_ 1782281058175156224
author MacLeod, Iona M.
Larkin, Denis M.
Lewin, Harris A.
Hayes, Ben J.
Goddard, Mike E.
author_facet MacLeod, Iona M.
Larkin, Denis M.
Lewin, Harris A.
Hayes, Ben J.
Goddard, Mike E.
author_sort MacLeod, Iona M.
collection PubMed
description Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (N(e)) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in N(e). The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the N(e) around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with N(e) of between 3,500 and 6,000. The most recent reduction of N(e) to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
format Online
Article
Text
id pubmed-3748359
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37483592013-08-21 Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors MacLeod, Iona M. Larkin, Denis M. Lewin, Harris A. Hayes, Ben J. Goddard, Mike E. Mol Biol Evol Methods Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (N(e)) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in N(e). The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the N(e) around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with N(e) of between 3,500 and 6,000. The most recent reduction of N(e) to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. Oxford University Press 2013-09 2013-07-10 /pmc/articles/PMC3748359/ /pubmed/23842528 http://dx.doi.org/10.1093/molbev/mst125 Text en © The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods
MacLeod, Iona M.
Larkin, Denis M.
Lewin, Harris A.
Hayes, Ben J.
Goddard, Mike E.
Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
title Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
title_full Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
title_fullStr Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
title_full_unstemmed Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
title_short Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors
title_sort inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3748359/
https://www.ncbi.nlm.nih.gov/pubmed/23842528
http://dx.doi.org/10.1093/molbev/mst125
work_keys_str_mv AT macleodionam inferringdemographyfromrunsofhomozygosityinwholegenomesequencewithcorrectionforsequenceerrors
AT larkindenism inferringdemographyfromrunsofhomozygosityinwholegenomesequencewithcorrectionforsequenceerrors
AT lewinharrisa inferringdemographyfromrunsofhomozygosityinwholegenomesequencewithcorrectionforsequenceerrors
AT hayesbenj inferringdemographyfromrunsofhomozygosityinwholegenomesequencewithcorrectionforsequenceerrors
AT goddardmikee inferringdemographyfromrunsofhomozygosityinwholegenomesequencewithcorrectionforsequenceerrors