Cargando…

Inferring whole-genome histories in large population datasets

Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology as it encodes information about the events and forces that have influenced a species. However, current methods are limited, with the most accurate able to process no more than a hundred sample...

Descripción completa

Detalles Bibliográficos
Autores principales: Kelleher, Jerome, Wong, Yan, Wohns, Anthony W., Fadil, Chaimaa, Albers, Patrick K., McVean, Gil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6726478/
https://www.ncbi.nlm.nih.gov/pubmed/31477934
http://dx.doi.org/10.1038/s41588-019-0483-y
Descripción
Sumario:Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology as it encodes information about the events and forces that have influenced a species. However, current methods are limited, with the most accurate able to process no more than a hundred samples. With data sets consisting of millions of genomes being collected, there is a need for scalable and efficient inference methods to fully utilise these resources. We introduce an algorithm to infer whole-genome histories with comparable accuracy to the state-of-the-art but able to process four orders of magnitude more sequences. The approach also provides an “evolutionary encoding” of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.