Cargando…
ARA: a flexible pipeline for automated exploration of NCBI SRA datasets
BACKGROUND: One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data generated by the next-generation sequencing (NGS) te...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433097/ https://www.ncbi.nlm.nih.gov/pubmed/37589306 http://dx.doi.org/10.1093/gigascience/giad067 |
_version_ | 1785091575582294016 |
---|---|
author | Maurya, Anand Szymanski, Maciej Karlowski, Wojciech M |
author_facet | Maurya, Anand Szymanski, Maciej Karlowski, Wojciech M |
author_sort | Maurya, Anand |
collection | PubMed |
description | BACKGROUND: One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data generated by the next-generation sequencing (NGS) technologies, this approach is often not available. The hierarchical organization of the NGS records is primarily designed for browsing or text-based searches of the information provided in metadata-related keywords, limiting the efficiency of database exploration. FINDINGS: We developed an automated pipeline that incorporates the well-established NGS data-processing tools and procedures to allow easy and effective sampling of the NCBI SRA database records. Given a file with query nucleotide sequences, our tool estimates the matching content of SRA accessions by probing only a user-defined fraction of a record's sequences. Based on the selected parameters, it allows performing a full mapping experiment with records that meet the required criteria. The pipeline is designed to be easy to operate—it offers a fully automatic setup procedure and is fixed on tested supporting tools. The modular design and implemented usage modes allow a user to scale up the analyses into complex computational infrastructure. CONCLUSIONS: We present an easy-to-operate and automated tool that expands the way a user can access and explore the information contained within the records deposited in the NCBI SRA database. |
format | Online Article Text |
id | pubmed-10433097 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104330972023-08-18 ARA: a flexible pipeline for automated exploration of NCBI SRA datasets Maurya, Anand Szymanski, Maciej Karlowski, Wojciech M Gigascience Technical Note BACKGROUND: One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data generated by the next-generation sequencing (NGS) technologies, this approach is often not available. The hierarchical organization of the NGS records is primarily designed for browsing or text-based searches of the information provided in metadata-related keywords, limiting the efficiency of database exploration. FINDINGS: We developed an automated pipeline that incorporates the well-established NGS data-processing tools and procedures to allow easy and effective sampling of the NCBI SRA database records. Given a file with query nucleotide sequences, our tool estimates the matching content of SRA accessions by probing only a user-defined fraction of a record's sequences. Based on the selected parameters, it allows performing a full mapping experiment with records that meet the required criteria. The pipeline is designed to be easy to operate—it offers a fully automatic setup procedure and is fixed on tested supporting tools. The modular design and implemented usage modes allow a user to scale up the analyses into complex computational infrastructure. CONCLUSIONS: We present an easy-to-operate and automated tool that expands the way a user can access and explore the information contained within the records deposited in the NCBI SRA database. Oxford University Press 2023-08-17 /pmc/articles/PMC10433097/ /pubmed/37589306 http://dx.doi.org/10.1093/gigascience/giad067 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Maurya, Anand Szymanski, Maciej Karlowski, Wojciech M ARA: a flexible pipeline for automated exploration of NCBI SRA datasets |
title | ARA: a flexible pipeline for automated exploration of NCBI SRA datasets |
title_full | ARA: a flexible pipeline for automated exploration of NCBI SRA datasets |
title_fullStr | ARA: a flexible pipeline for automated exploration of NCBI SRA datasets |
title_full_unstemmed | ARA: a flexible pipeline for automated exploration of NCBI SRA datasets |
title_short | ARA: a flexible pipeline for automated exploration of NCBI SRA datasets |
title_sort | ara: a flexible pipeline for automated exploration of ncbi sra datasets |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433097/ https://www.ncbi.nlm.nih.gov/pubmed/37589306 http://dx.doi.org/10.1093/gigascience/giad067 |
work_keys_str_mv | AT mauryaanand araaflexiblepipelineforautomatedexplorationofncbisradatasets AT szymanskimaciej araaflexiblepipelineforautomatedexplorationofncbisradatasets AT karlowskiwojciechm araaflexiblepipelineforautomatedexplorationofncbisradatasets |