Cargando…

KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition

Here we developed KARAJ, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to auto...

Descripción completa

Detalles Bibliográficos
Autores principales: Labani, Mahdieh, Beheshti, Amin, Lovell, Nigel H., Alinejad-Rokny, Hamid, Afrasiabi, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9694301/
https://www.ncbi.nlm.nih.gov/pubmed/36430895
http://dx.doi.org/10.3390/ijms232214418
_version_ 1784837765869862912
author Labani, Mahdieh
Beheshti, Amin
Lovell, Nigel H.
Alinejad-Rokny, Hamid
Afrasiabi, Ali
author_facet Labani, Mahdieh
Beheshti, Amin
Lovell, Nigel H.
Alinejad-Rokny, Hamid
Afrasiabi, Ali
author_sort Labani, Mahdieh
collection PubMed
description Here we developed KARAJ, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to automate four tasks as follows; firstly, it provides a summary list of accessible datasets generated by or used in these scientific articles, enabling users to select appropriate datasets; secondly, KARAJ calculates the size of files that users want to download and confirms the availability of adequate space on the local disk; thirdly, it generates a metadata table containing sample information and the experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. Further, KARAJ provides a parallel downloading framework powered by Aspera connect which reduces the downloading time significantly.
format Online
Article
Text
id pubmed-9694301
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96943012022-11-26 KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition Labani, Mahdieh Beheshti, Amin Lovell, Nigel H. Alinejad-Rokny, Hamid Afrasiabi, Ali Int J Mol Sci Article Here we developed KARAJ, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to automate four tasks as follows; firstly, it provides a summary list of accessible datasets generated by or used in these scientific articles, enabling users to select appropriate datasets; secondly, KARAJ calculates the size of files that users want to download and confirms the availability of adequate space on the local disk; thirdly, it generates a metadata table containing sample information and the experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. Further, KARAJ provides a parallel downloading framework powered by Aspera connect which reduces the downloading time significantly. MDPI 2022-11-20 /pmc/articles/PMC9694301/ /pubmed/36430895 http://dx.doi.org/10.3390/ijms232214418 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Labani, Mahdieh
Beheshti, Amin
Lovell, Nigel H.
Alinejad-Rokny, Hamid
Afrasiabi, Ali
KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition
title KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition
title_full KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition
title_fullStr KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition
title_full_unstemmed KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition
title_short KARAJ: An Efficient Adaptive Multi-Processor Tool to Streamline Genomic and Transcriptomic Sequence Data Acquisition
title_sort karaj: an efficient adaptive multi-processor tool to streamline genomic and transcriptomic sequence data acquisition
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9694301/
https://www.ncbi.nlm.nih.gov/pubmed/36430895
http://dx.doi.org/10.3390/ijms232214418
work_keys_str_mv AT labanimahdieh karajanefficientadaptivemultiprocessortooltostreamlinegenomicandtranscriptomicsequencedataacquisition
AT beheshtiamin karajanefficientadaptivemultiprocessortooltostreamlinegenomicandtranscriptomicsequencedataacquisition
AT lovellnigelh karajanefficientadaptivemultiprocessortooltostreamlinegenomicandtranscriptomicsequencedataacquisition
AT alinejadroknyhamid karajanefficientadaptivemultiprocessortooltostreamlinegenomicandtranscriptomicsequencedataacquisition
AT afrasiabiali karajanefficientadaptivemultiprocessortooltostreamlinegenomicandtranscriptomicsequencedataacquisition