Cargando…
Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework
BACKGROUND: Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data close...
Autores principales: | Ahmad, Tanveer, Ahmed, Nauman, Al-Ars, Zaid, Hofstee, H. Peter |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7677819/ https://www.ncbi.nlm.nih.gov/pubmed/33208101 http://dx.doi.org/10.1186/s12864-020-07013-y |
Ejemplares similares
-
Recommendations for performance optimizations when using GATK3.8 and GATK4
por: Heldenbrand, Jacob R, et al.
Publicado: (2019) -
SparkRA: Enabling Big Data Scalability for the GATK RNA-seq Pipeline with Apache Spark
por: Al-Ars, Zaid, et al.
Publicado: (2020) -
GPU accelerated sequence alignment with traceback for GATK HaplotypeCaller
por: Ren, Shanshan, et al.
Publicado: (2019) -
SparkGA2: Production-quality memory-efficient Apache Spark based genome analysis framework
por: Mushtaq, Hamid, et al.
Publicado: (2019) -
OVarFlow: a resource optimized GATK 4 based Open source Variant calling workFlow
por: Bathke, Jochen, et al.
Publicado: (2021)