Cargando…

Comparing local ancestry inference models in populations of two- and three-way admixture

Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and im...

Descripción completa

Detalles Bibliográficos
Autores principales: Schubert, Ryan, Andaleon, Angela, Wheeler, Heather E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7537619/
https://www.ncbi.nlm.nih.gov/pubmed/33072440
http://dx.doi.org/10.7717/peerj.10090
_version_ 1783590701588545536
author Schubert, Ryan
Andaleon, Angela
Wheeler, Heather E.
author_facet Schubert, Ryan
Andaleon, Angela
Wheeler, Heather E.
author_sort Schubert, Ryan
collection PubMed
description Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation that have been used in human complex trait genetics are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019). As users rather than developers, we sought to perform direct comparisons of accuracy, runtime, memory usage, and usability of these software tools to determine which is best for incorporation into association study pipelines. We find that in the majority of cases RFMix has the highest median accuracy with the ranking of the remaining software dependent on the ancestral architecture of the population tested. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software increase linearly with respect to sample size. The only exception is RFMix, which increases quadratically with respect to runtime and linearly with respect to memory. Effective local ancestry estimation tools are necessary to increase diversity and prevent population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform just as well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking.
format Online
Article
Text
id pubmed-7537619
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-75376192020-10-15 Comparing local ancestry inference models in populations of two- and three-way admixture Schubert, Ryan Andaleon, Angela Wheeler, Heather E. PeerJ Bioinformatics Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation that have been used in human complex trait genetics are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019). As users rather than developers, we sought to perform direct comparisons of accuracy, runtime, memory usage, and usability of these software tools to determine which is best for incorporation into association study pipelines. We find that in the majority of cases RFMix has the highest median accuracy with the ranking of the remaining software dependent on the ancestral architecture of the population tested. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software increase linearly with respect to sample size. The only exception is RFMix, which increases quadratically with respect to runtime and linearly with respect to memory. Effective local ancestry estimation tools are necessary to increase diversity and prevent population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform just as well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking. PeerJ Inc. 2020-10-02 /pmc/articles/PMC7537619/ /pubmed/33072440 http://dx.doi.org/10.7717/peerj.10090 Text en ©2020 Schubert et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Schubert, Ryan
Andaleon, Angela
Wheeler, Heather E.
Comparing local ancestry inference models in populations of two- and three-way admixture
title Comparing local ancestry inference models in populations of two- and three-way admixture
title_full Comparing local ancestry inference models in populations of two- and three-way admixture
title_fullStr Comparing local ancestry inference models in populations of two- and three-way admixture
title_full_unstemmed Comparing local ancestry inference models in populations of two- and three-way admixture
title_short Comparing local ancestry inference models in populations of two- and three-way admixture
title_sort comparing local ancestry inference models in populations of two- and three-way admixture
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7537619/
https://www.ncbi.nlm.nih.gov/pubmed/33072440
http://dx.doi.org/10.7717/peerj.10090
work_keys_str_mv AT schubertryan comparinglocalancestryinferencemodelsinpopulationsoftwoandthreewayadmixture
AT andaleonangela comparinglocalancestryinferencemodelsinpopulationsoftwoandthreewayadmixture
AT wheelerheathere comparinglocalancestryinferencemodelsinpopulationsoftwoandthreewayadmixture