Cargando…

The parallelism motifs of genomic data analysis

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Yelick, Katherine, Buluç, Aydın, Awan, Muaaz, Azad, Ariful, Brock, Benjamin, Egan, Rob, Ekanayake, Saliya, Ellis, Marquita, Georganas, Evangelos, Guidi, Giulia, Hofmeyr, Steven, Selvitopi, Oguz, Teodoropol, Cristina, Oliker, Leonid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society Publishing 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015300/
https://www.ncbi.nlm.nih.gov/pubmed/31955674
http://dx.doi.org/10.1098/rsta.2019.0394
_version_ 1783496782026637312
author Yelick, Katherine
Buluç, Aydın
Awan, Muaaz
Azad, Ariful
Brock, Benjamin
Egan, Rob
Ekanayake, Saliya
Ellis, Marquita
Georganas, Evangelos
Guidi, Giulia
Hofmeyr, Steven
Selvitopi, Oguz
Teodoropol, Cristina
Oliker, Leonid
author_facet Yelick, Katherine
Buluç, Aydın
Awan, Muaaz
Azad, Ariful
Brock, Benjamin
Egan, Rob
Ekanayake, Saliya
Ellis, Marquita
Georganas, Evangelos
Guidi, Giulia
Hofmeyr, Steven
Selvitopi, Oguz
Teodoropol, Cristina
Oliker, Leonid
author_sort Yelick, Katherine
collection PubMed
description Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
format Online
Article
Text
id pubmed-7015300
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Royal Society Publishing
record_format MEDLINE/PubMed
spelling pubmed-70153002020-02-18 The parallelism motifs of genomic data analysis Yelick, Katherine Buluç, Aydın Awan, Muaaz Azad, Ariful Brock, Benjamin Egan, Rob Ekanayake, Saliya Ellis, Marquita Georganas, Evangelos Guidi, Giulia Hofmeyr, Steven Selvitopi, Oguz Teodoropol, Cristina Oliker, Leonid Philos Trans A Math Phys Eng Sci Articles Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’. The Royal Society Publishing 2020-03-06 2020-01-20 /pmc/articles/PMC7015300/ /pubmed/31955674 http://dx.doi.org/10.1098/rsta.2019.0394 Text en © 2020 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Articles
Yelick, Katherine
Buluç, Aydın
Awan, Muaaz
Azad, Ariful
Brock, Benjamin
Egan, Rob
Ekanayake, Saliya
Ellis, Marquita
Georganas, Evangelos
Guidi, Giulia
Hofmeyr, Steven
Selvitopi, Oguz
Teodoropol, Cristina
Oliker, Leonid
The parallelism motifs of genomic data analysis
title The parallelism motifs of genomic data analysis
title_full The parallelism motifs of genomic data analysis
title_fullStr The parallelism motifs of genomic data analysis
title_full_unstemmed The parallelism motifs of genomic data analysis
title_short The parallelism motifs of genomic data analysis
title_sort parallelism motifs of genomic data analysis
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015300/
https://www.ncbi.nlm.nih.gov/pubmed/31955674
http://dx.doi.org/10.1098/rsta.2019.0394
work_keys_str_mv AT yelickkatherine theparallelismmotifsofgenomicdataanalysis
AT bulucaydın theparallelismmotifsofgenomicdataanalysis
AT awanmuaaz theparallelismmotifsofgenomicdataanalysis
AT azadariful theparallelismmotifsofgenomicdataanalysis
AT brockbenjamin theparallelismmotifsofgenomicdataanalysis
AT eganrob theparallelismmotifsofgenomicdataanalysis
AT ekanayakesaliya theparallelismmotifsofgenomicdataanalysis
AT ellismarquita theparallelismmotifsofgenomicdataanalysis
AT georganasevangelos theparallelismmotifsofgenomicdataanalysis
AT guidigiulia theparallelismmotifsofgenomicdataanalysis
AT hofmeyrsteven theparallelismmotifsofgenomicdataanalysis
AT selvitopioguz theparallelismmotifsofgenomicdataanalysis
AT teodoropolcristina theparallelismmotifsofgenomicdataanalysis
AT olikerleonid theparallelismmotifsofgenomicdataanalysis
AT yelickkatherine parallelismmotifsofgenomicdataanalysis
AT bulucaydın parallelismmotifsofgenomicdataanalysis
AT awanmuaaz parallelismmotifsofgenomicdataanalysis
AT azadariful parallelismmotifsofgenomicdataanalysis
AT brockbenjamin parallelismmotifsofgenomicdataanalysis
AT eganrob parallelismmotifsofgenomicdataanalysis
AT ekanayakesaliya parallelismmotifsofgenomicdataanalysis
AT ellismarquita parallelismmotifsofgenomicdataanalysis
AT georganasevangelos parallelismmotifsofgenomicdataanalysis
AT guidigiulia parallelismmotifsofgenomicdataanalysis
AT hofmeyrsteven parallelismmotifsofgenomicdataanalysis
AT selvitopioguz parallelismmotifsofgenomicdataanalysis
AT teodoropolcristina parallelismmotifsofgenomicdataanalysis
AT olikerleonid parallelismmotifsofgenomicdataanalysis