Cargando…

Ancestry inference using reference labeled clusters of haplotypes

BACKGROUND: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from refe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Yong, Song, Shiya, Schraiber, Joshua G., Sedghifar, Alisa, Byrnes, Jake K., Turissini, David A., Hong, Eurie L., Ball, Catherine A., Noto, Keith
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8466715/ https://www.ncbi.nlm.nih.gov/pubmed/34563119 http://dx.doi.org/10.1186/s12859-021-04350-x

Descripción
Sumario:	BACKGROUND: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. RESULTS: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. CONCLUSIONS: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04350-x.

Ancestry inference using reference labeled clusters of haplotypes

Ejemplares similares