Cargando…

Rapid detection of identity-by-descent tracts for mega-scale datasets

The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datase...

Descripción completa

Detalles Bibliográficos
Autores principales: Shemirani, Ruhollah, Belbin, Gillian M., Avery, Christy L., Kenny, Eimear E., Gignoux, Christopher R., Ambite, José Luis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192555/
https://www.ncbi.nlm.nih.gov/pubmed/34112768
http://dx.doi.org/10.1038/s41467-021-22910-w
_version_ 1783706072468422656
author Shemirani, Ruhollah
Belbin, Gillian M.
Avery, Christy L.
Kenny, Eimear E.
Gignoux, Christopher R.
Ambite, José Luis
author_facet Shemirani, Ruhollah
Belbin, Gillian M.
Avery, Christy L.
Kenny, Eimear E.
Gignoux, Christopher R.
Ambite, José Luis
author_sort Shemirani, Ruhollah
collection PubMed
description The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections.
format Online
Article
Text
id pubmed-8192555
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-81925552021-06-17 Rapid detection of identity-by-descent tracts for mega-scale datasets Shemirani, Ruhollah Belbin, Gillian M. Avery, Christy L. Kenny, Eimear E. Gignoux, Christopher R. Ambite, José Luis Nat Commun Article The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections. Nature Publishing Group UK 2021-06-10 /pmc/articles/PMC8192555/ /pubmed/34112768 http://dx.doi.org/10.1038/s41467-021-22910-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Shemirani, Ruhollah
Belbin, Gillian M.
Avery, Christy L.
Kenny, Eimear E.
Gignoux, Christopher R.
Ambite, José Luis
Rapid detection of identity-by-descent tracts for mega-scale datasets
title Rapid detection of identity-by-descent tracts for mega-scale datasets
title_full Rapid detection of identity-by-descent tracts for mega-scale datasets
title_fullStr Rapid detection of identity-by-descent tracts for mega-scale datasets
title_full_unstemmed Rapid detection of identity-by-descent tracts for mega-scale datasets
title_short Rapid detection of identity-by-descent tracts for mega-scale datasets
title_sort rapid detection of identity-by-descent tracts for mega-scale datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192555/
https://www.ncbi.nlm.nih.gov/pubmed/34112768
http://dx.doi.org/10.1038/s41467-021-22910-w
work_keys_str_mv AT shemiraniruhollah rapiddetectionofidentitybydescenttractsformegascaledatasets
AT belbingillianm rapiddetectionofidentitybydescenttractsformegascaledatasets
AT averychristyl rapiddetectionofidentitybydescenttractsformegascaledatasets
AT kennyeimeare rapiddetectionofidentitybydescenttractsformegascaledatasets
AT gignouxchristopherr rapiddetectionofidentitybydescenttractsformegascaledatasets
AT ambitejoseluis rapiddetectionofidentitybydescenttractsformegascaledatasets