Cargando…
The parallelism motifs of genomic data analysis
Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computatio...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society Publishing
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015300/ https://www.ncbi.nlm.nih.gov/pubmed/31955674 http://dx.doi.org/10.1098/rsta.2019.0394 |
_version_ | 1783496782026637312 |
---|---|
author | Yelick, Katherine Buluç, Aydın Awan, Muaaz Azad, Ariful Brock, Benjamin Egan, Rob Ekanayake, Saliya Ellis, Marquita Georganas, Evangelos Guidi, Giulia Hofmeyr, Steven Selvitopi, Oguz Teodoropol, Cristina Oliker, Leonid |
author_facet | Yelick, Katherine Buluç, Aydın Awan, Muaaz Azad, Ariful Brock, Benjamin Egan, Rob Ekanayake, Saliya Ellis, Marquita Georganas, Evangelos Guidi, Giulia Hofmeyr, Steven Selvitopi, Oguz Teodoropol, Cristina Oliker, Leonid |
author_sort | Yelick, Katherine |
collection | PubMed |
description | Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’. |
format | Online Article Text |
id | pubmed-7015300 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | The Royal Society Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-70153002020-02-18 The parallelism motifs of genomic data analysis Yelick, Katherine Buluç, Aydın Awan, Muaaz Azad, Ariful Brock, Benjamin Egan, Rob Ekanayake, Saliya Ellis, Marquita Georganas, Evangelos Guidi, Giulia Hofmeyr, Steven Selvitopi, Oguz Teodoropol, Cristina Oliker, Leonid Philos Trans A Math Phys Eng Sci Articles Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’. The Royal Society Publishing 2020-03-06 2020-01-20 /pmc/articles/PMC7015300/ /pubmed/31955674 http://dx.doi.org/10.1098/rsta.2019.0394 Text en © 2020 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Articles Yelick, Katherine Buluç, Aydın Awan, Muaaz Azad, Ariful Brock, Benjamin Egan, Rob Ekanayake, Saliya Ellis, Marquita Georganas, Evangelos Guidi, Giulia Hofmeyr, Steven Selvitopi, Oguz Teodoropol, Cristina Oliker, Leonid The parallelism motifs of genomic data analysis |
title | The parallelism motifs of genomic data analysis |
title_full | The parallelism motifs of genomic data analysis |
title_fullStr | The parallelism motifs of genomic data analysis |
title_full_unstemmed | The parallelism motifs of genomic data analysis |
title_short | The parallelism motifs of genomic data analysis |
title_sort | parallelism motifs of genomic data analysis |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7015300/ https://www.ncbi.nlm.nih.gov/pubmed/31955674 http://dx.doi.org/10.1098/rsta.2019.0394 |
work_keys_str_mv | AT yelickkatherine theparallelismmotifsofgenomicdataanalysis AT bulucaydın theparallelismmotifsofgenomicdataanalysis AT awanmuaaz theparallelismmotifsofgenomicdataanalysis AT azadariful theparallelismmotifsofgenomicdataanalysis AT brockbenjamin theparallelismmotifsofgenomicdataanalysis AT eganrob theparallelismmotifsofgenomicdataanalysis AT ekanayakesaliya theparallelismmotifsofgenomicdataanalysis AT ellismarquita theparallelismmotifsofgenomicdataanalysis AT georganasevangelos theparallelismmotifsofgenomicdataanalysis AT guidigiulia theparallelismmotifsofgenomicdataanalysis AT hofmeyrsteven theparallelismmotifsofgenomicdataanalysis AT selvitopioguz theparallelismmotifsofgenomicdataanalysis AT teodoropolcristina theparallelismmotifsofgenomicdataanalysis AT olikerleonid theparallelismmotifsofgenomicdataanalysis AT yelickkatherine parallelismmotifsofgenomicdataanalysis AT bulucaydın parallelismmotifsofgenomicdataanalysis AT awanmuaaz parallelismmotifsofgenomicdataanalysis AT azadariful parallelismmotifsofgenomicdataanalysis AT brockbenjamin parallelismmotifsofgenomicdataanalysis AT eganrob parallelismmotifsofgenomicdataanalysis AT ekanayakesaliya parallelismmotifsofgenomicdataanalysis AT ellismarquita parallelismmotifsofgenomicdataanalysis AT georganasevangelos parallelismmotifsofgenomicdataanalysis AT guidigiulia parallelismmotifsofgenomicdataanalysis AT hofmeyrsteven parallelismmotifsofgenomicdataanalysis AT selvitopioguz parallelismmotifsofgenomicdataanalysis AT teodoropolcristina parallelismmotifsofgenomicdataanalysis AT olikerleonid parallelismmotifsofgenomicdataanalysis |