Cargando…

Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data

MOTIVATION: Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons—either on pairs of SNPs or pairs of individuals—are extremely computationally challenging. We propose a generic alg...

Descripción completa

Detalles Bibliográficos
Autores principales: Sapin, Emmanuel, Keller, Matthew C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8352502/
https://www.ncbi.nlm.nih.gov/pubmed/33705528
http://dx.doi.org/10.1093/bioinformatics/btab084
_version_ 1783736193449459712
author Sapin, Emmanuel
Keller, Matthew C
author_facet Sapin, Emmanuel
Keller, Matthew C
author_sort Sapin, Emmanuel
collection PubMed
description MOTIVATION: Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons—either on pairs of SNPs or pairs of individuals—are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n(2) comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization. RESULTS: We demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science. AVAILABILITY AND IMPLEMENTATION: A GitHub page is available at https://github.com/emmanuelsapin with the code to generate data needed for the implementation
format Online
Article
Text
id pubmed-8352502
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83525022021-08-10 Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data Sapin, Emmanuel Keller, Matthew C Bioinformatics Original Papers MOTIVATION: Pairwise comparison problems arise in many areas of science. In genomics, datasets are already large and getting larger, and so operations that require pairwise comparisons—either on pairs of SNPs or pairs of individuals—are extremely computationally challenging. We propose a generic algorithm for addressing pairwise comparison problems that breaks a large problem (of order n(2) comparisons) into multiple smaller ones (each of order n comparisons), allowing for massive parallelization. RESULTS: We demonstrated that this approach is very efficient for calling identical by descent (IBD) segments between all pairs of individuals in the UK Biobank dataset, with a 250-fold savings in time and 750-fold savings in memory over the standard approach to detecting such segments across the full dataset. This efficiency should extend to other methods of IBD calling and, more generally, to other pairwise comparison tasks in genomics or other areas of science. AVAILABILITY AND IMPLEMENTATION: A GitHub page is available at https://github.com/emmanuelsapin with the code to generate data needed for the implementation Oxford University Press 2021-03-11 /pmc/articles/PMC8352502/ /pubmed/33705528 http://dx.doi.org/10.1093/bioinformatics/btab084 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Sapin, Emmanuel
Keller, Matthew C
Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
title Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
title_full Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
title_fullStr Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
title_full_unstemmed Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
title_short Novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
title_sort novel approach for parallelizing pairwise comparison problems as applied to detecting segments identical by decent in whole-genome data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8352502/
https://www.ncbi.nlm.nih.gov/pubmed/33705528
http://dx.doi.org/10.1093/bioinformatics/btab084
work_keys_str_mv AT sapinemmanuel novelapproachforparallelizingpairwisecomparisonproblemsasappliedtodetectingsegmentsidenticalbydecentinwholegenomedata
AT kellermatthewc novelapproachforparallelizingpairwisecomparisonproblemsasappliedtodetectingsegmentsidenticalbydecentinwholegenomedata