Cargando…
Moving Just Enough Deep Sequencing Data to Get the Job Done
MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572328/ https://www.ncbi.nlm.nih.gov/pubmed/31236009 http://dx.doi.org/10.1177/1177932219856359 |
_version_ | 1783427614924341248 |
---|---|
author | Mills, Nicholas Bensman, Ethan M Poehlman, William L Ligon, Walter B Feltus, F Alex |
author_facet | Mills, Nicholas Bensman, Ethan M Poehlman, William L Ligon, Walter B Feltus, F Alex |
author_sort | Mills, Nicholas |
collection | PubMed |
description | MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. RESULTS: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. AVAILABILITY: All results were generated using public datasets from NCBI and publicly available open source software. |
format | Online Article Text |
id | pubmed-6572328 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-65723282019-06-24 Moving Just Enough Deep Sequencing Data to Get the Job Done Mills, Nicholas Bensman, Ethan M Poehlman, William L Ligon, Walter B Feltus, F Alex Bioinform Biol Insights Original Research MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. RESULTS: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. AVAILABILITY: All results were generated using public datasets from NCBI and publicly available open source software. SAGE Publications 2019-06-14 /pmc/articles/PMC6572328/ /pubmed/31236009 http://dx.doi.org/10.1177/1177932219856359 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Mills, Nicholas Bensman, Ethan M Poehlman, William L Ligon, Walter B Feltus, F Alex Moving Just Enough Deep Sequencing Data to Get the Job Done |
title | Moving Just Enough Deep Sequencing Data to Get the Job Done |
title_full | Moving Just Enough Deep Sequencing Data to Get the Job Done |
title_fullStr | Moving Just Enough Deep Sequencing Data to Get the Job Done |
title_full_unstemmed | Moving Just Enough Deep Sequencing Data to Get the Job Done |
title_short | Moving Just Enough Deep Sequencing Data to Get the Job Done |
title_sort | moving just enough deep sequencing data to get the job done |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6572328/ https://www.ncbi.nlm.nih.gov/pubmed/31236009 http://dx.doi.org/10.1177/1177932219856359 |
work_keys_str_mv | AT millsnicholas movingjustenoughdeepsequencingdatatogetthejobdone AT bensmanethanm movingjustenoughdeepsequencingdatatogetthejobdone AT poehlmanwilliaml movingjustenoughdeepsequencingdatatogetthejobdone AT ligonwalterb movingjustenoughdeepsequencingdatatogetthejobdone AT feltusfalex movingjustenoughdeepsequencingdatatogetthejobdone |