Cargando…
Rapid detection of identity-by-descent tracts for mega-scale datasets
The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datase...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192555/ https://www.ncbi.nlm.nih.gov/pubmed/34112768 http://dx.doi.org/10.1038/s41467-021-22910-w |
_version_ | 1783706072468422656 |
---|---|
author | Shemirani, Ruhollah Belbin, Gillian M. Avery, Christy L. Kenny, Eimear E. Gignoux, Christopher R. Ambite, José Luis |
author_facet | Shemirani, Ruhollah Belbin, Gillian M. Avery, Christy L. Kenny, Eimear E. Gignoux, Christopher R. Ambite, José Luis |
author_sort | Shemirani, Ruhollah |
collection | PubMed |
description | The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections. |
format | Online Article Text |
id | pubmed-8192555 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-81925552021-06-17 Rapid detection of identity-by-descent tracts for mega-scale datasets Shemirani, Ruhollah Belbin, Gillian M. Avery, Christy L. Kenny, Eimear E. Gignoux, Christopher R. Ambite, José Luis Nat Commun Article The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections. Nature Publishing Group UK 2021-06-10 /pmc/articles/PMC8192555/ /pubmed/34112768 http://dx.doi.org/10.1038/s41467-021-22910-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Shemirani, Ruhollah Belbin, Gillian M. Avery, Christy L. Kenny, Eimear E. Gignoux, Christopher R. Ambite, José Luis Rapid detection of identity-by-descent tracts for mega-scale datasets |
title | Rapid detection of identity-by-descent tracts for mega-scale datasets |
title_full | Rapid detection of identity-by-descent tracts for mega-scale datasets |
title_fullStr | Rapid detection of identity-by-descent tracts for mega-scale datasets |
title_full_unstemmed | Rapid detection of identity-by-descent tracts for mega-scale datasets |
title_short | Rapid detection of identity-by-descent tracts for mega-scale datasets |
title_sort | rapid detection of identity-by-descent tracts for mega-scale datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192555/ https://www.ncbi.nlm.nih.gov/pubmed/34112768 http://dx.doi.org/10.1038/s41467-021-22910-w |
work_keys_str_mv | AT shemiraniruhollah rapiddetectionofidentitybydescenttractsformegascaledatasets AT belbingillianm rapiddetectionofidentitybydescenttractsformegascaledatasets AT averychristyl rapiddetectionofidentitybydescenttractsformegascaledatasets AT kennyeimeare rapiddetectionofidentitybydescenttractsformegascaledatasets AT gignouxchristopherr rapiddetectionofidentitybydescenttractsformegascaledatasets AT ambitejoseluis rapiddetectionofidentitybydescenttractsformegascaledatasets |