Cargando…

Moving Just Enough Deep Sequencing Data to Get the Job Done

MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of...

Descripción completa

Detalles Bibliográficos
Autores principales: Mills, Nicholas, Bensman, Ethan M, Poehlman, William L, Ligon, Walter B, Feltus, F Alex
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572328/
https://www.ncbi.nlm.nih.gov/pubmed/31236009
http://dx.doi.org/10.1177/1177932219856359
_version_ 1783427614924341248
author Mills, Nicholas
Bensman, Ethan M
Poehlman, William L
Ligon, Walter B
Feltus, F Alex
author_facet Mills, Nicholas
Bensman, Ethan M
Poehlman, William L
Ligon, Walter B
Feltus, F Alex
author_sort Mills, Nicholas
collection PubMed
description MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. RESULTS: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. AVAILABILITY: All results were generated using public datasets from NCBI and publicly available open source software.
format Online
Article
Text
id pubmed-6572328
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-65723282019-06-24 Moving Just Enough Deep Sequencing Data to Get the Job Done Mills, Nicholas Bensman, Ethan M Poehlman, William L Ligon, Walter B Feltus, F Alex Bioinform Biol Insights Original Research MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. RESULTS: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. AVAILABILITY: All results were generated using public datasets from NCBI and publicly available open source software. SAGE Publications 2019-06-14 /pmc/articles/PMC6572328/ /pubmed/31236009 http://dx.doi.org/10.1177/1177932219856359 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research
Mills, Nicholas
Bensman, Ethan M
Poehlman, William L
Ligon, Walter B
Feltus, F Alex
Moving Just Enough Deep Sequencing Data to Get the Job Done
title Moving Just Enough Deep Sequencing Data to Get the Job Done
title_full Moving Just Enough Deep Sequencing Data to Get the Job Done
title_fullStr Moving Just Enough Deep Sequencing Data to Get the Job Done
title_full_unstemmed Moving Just Enough Deep Sequencing Data to Get the Job Done
title_short Moving Just Enough Deep Sequencing Data to Get the Job Done
title_sort moving just enough deep sequencing data to get the job done
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572328/
https://www.ncbi.nlm.nih.gov/pubmed/31236009
http://dx.doi.org/10.1177/1177932219856359
work_keys_str_mv AT millsnicholas movingjustenoughdeepsequencingdatatogetthejobdone
AT bensmanethanm movingjustenoughdeepsequencingdatatogetthejobdone
AT poehlmanwilliaml movingjustenoughdeepsequencingdatatogetthejobdone
AT ligonwalterb movingjustenoughdeepsequencingdatatogetthejobdone
AT feltusfalex movingjustenoughdeepsequencingdatatogetthejobdone