Cargando…
NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements
A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 par...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6771016/ https://www.ncbi.nlm.nih.gov/pubmed/31527408 http://dx.doi.org/10.3390/genes10090714 |
_version_ | 1783455617134886912 |
---|---|
author | Connor, Ryan Brister, Rodney Buchmann, Jan P. Deboutte, Ward Edwards, Rob Martí-Carreras, Joan Tisza, Mike Zalunin, Vadim Andrade-Martínez, Juan Cantu, Adrian D’Amour, Michael Efremov, Alexandre Fleischmann, Lydia Forero-Junco, Laura Garmaeva, Sanzhima Giluso, Melissa Glickman, Cody Henderson, Margaret Kellman, Benjamin Kristensen, David Leubsdorf, Carl Levi, Kyle Levi, Shane Pakala, Suman Peddu, Vikas Ponsero, Alise Ribeiro, Eldred Roy, Farrah Rutter, Lindsay Saha, Surya Shakya, Migun Shean, Ryan Miller, Matthew Tully, Benjamin Turkington, Christopher Youens-Clark, Ken Vanmechelen, Bert Busby, Ben |
author_facet | Connor, Ryan Brister, Rodney Buchmann, Jan P. Deboutte, Ward Edwards, Rob Martí-Carreras, Joan Tisza, Mike Zalunin, Vadim Andrade-Martínez, Juan Cantu, Adrian D’Amour, Michael Efremov, Alexandre Fleischmann, Lydia Forero-Junco, Laura Garmaeva, Sanzhima Giluso, Melissa Glickman, Cody Henderson, Margaret Kellman, Benjamin Kristensen, David Leubsdorf, Carl Levi, Kyle Levi, Shane Pakala, Suman Peddu, Vikas Ponsero, Alise Ribeiro, Eldred Roy, Farrah Rutter, Lindsay Saha, Surya Shakya, Migun Shean, Ryan Miller, Matthew Tully, Benjamin Turkington, Christopher Youens-Clark, Ken Vanmechelen, Bert Busby, Ben |
author_sort | Connor, Ryan |
collection | PubMed |
description | A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon. |
format | Online Article Text |
id | pubmed-6771016 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-67710162019-10-30 NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements Connor, Ryan Brister, Rodney Buchmann, Jan P. Deboutte, Ward Edwards, Rob Martí-Carreras, Joan Tisza, Mike Zalunin, Vadim Andrade-Martínez, Juan Cantu, Adrian D’Amour, Michael Efremov, Alexandre Fleischmann, Lydia Forero-Junco, Laura Garmaeva, Sanzhima Giluso, Melissa Glickman, Cody Henderson, Margaret Kellman, Benjamin Kristensen, David Leubsdorf, Carl Levi, Kyle Levi, Shane Pakala, Suman Peddu, Vikas Ponsero, Alise Ribeiro, Eldred Roy, Farrah Rutter, Lindsay Saha, Surya Shakya, Migun Shean, Ryan Miller, Matthew Tully, Benjamin Turkington, Christopher Youens-Clark, Ken Vanmechelen, Bert Busby, Ben Genes (Basel) Article A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon. MDPI 2019-09-16 /pmc/articles/PMC6771016/ /pubmed/31527408 http://dx.doi.org/10.3390/genes10090714 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Connor, Ryan Brister, Rodney Buchmann, Jan P. Deboutte, Ward Edwards, Rob Martí-Carreras, Joan Tisza, Mike Zalunin, Vadim Andrade-Martínez, Juan Cantu, Adrian D’Amour, Michael Efremov, Alexandre Fleischmann, Lydia Forero-Junco, Laura Garmaeva, Sanzhima Giluso, Melissa Glickman, Cody Henderson, Margaret Kellman, Benjamin Kristensen, David Leubsdorf, Carl Levi, Kyle Levi, Shane Pakala, Suman Peddu, Vikas Ponsero, Alise Ribeiro, Eldred Roy, Farrah Rutter, Lindsay Saha, Surya Shakya, Migun Shean, Ryan Miller, Matthew Tully, Benjamin Turkington, Christopher Youens-Clark, Ken Vanmechelen, Bert Busby, Ben NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements |
title | NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements |
title_full | NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements |
title_fullStr | NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements |
title_full_unstemmed | NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements |
title_short | NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements |
title_sort | ncbi’s virus discovery hackathon: engaging research communities to identify cloud infrastructure requirements |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6771016/ https://www.ncbi.nlm.nih.gov/pubmed/31527408 http://dx.doi.org/10.3390/genes10090714 |
work_keys_str_mv | AT connorryan ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT bristerrodney ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT buchmannjanp ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT deboutteward ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT edwardsrob ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT marticarrerasjoan ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT tiszamike ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT zaluninvadim ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT andrademartinezjuan ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT cantuadrian ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT damourmichael ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT efremovalexandre ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT fleischmannlydia ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT forerojuncolaura ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT garmaevasanzhima ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT gilusomelissa ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT glickmancody ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT hendersonmargaret ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT kellmanbenjamin ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT kristensendavid ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT leubsdorfcarl ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT levikyle ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT levishane ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT pakalasuman ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT pedduvikas ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT ponseroalise ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT ribeiroeldred ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT royfarrah ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT rutterlindsay ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT sahasurya ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT shakyamigun ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT sheanryan ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT millermatthew ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT tullybenjamin ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT turkingtonchristopher ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT youensclarkken ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT vanmechelenbert ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements AT busbyben ncbisvirusdiscoveryhackathonengagingresearchcommunitiestoidentifycloudinfrastructurerequirements |