Cargando…

Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)

Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germlin...

Descripción completa

Detalles Bibliográficos
Autores principales: Kyritsis, Konstantinos A., Pechlivanis, Nikolaos, Psomopoulos, Fotis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662043/
https://www.ncbi.nlm.nih.gov/pubmed/38025398
http://dx.doi.org/10.3389/fbinf.2023.1275593
_version_ 1785138118793363456
author Kyritsis, Konstantinos A.
Pechlivanis, Nikolaos
Psomopoulos, Fotis
author_facet Kyritsis, Konstantinos A.
Pechlivanis, Nikolaos
Psomopoulos, Fotis
author_sort Kyritsis, Konstantinos A.
collection PubMed
description Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.
format Online
Article
Text
id pubmed-10662043
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106620432023-11-07 Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) Kyritsis, Konstantinos A. Pechlivanis, Nikolaos Psomopoulos, Fotis Front Bioinform Bioinformatics Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses. Frontiers Media S.A. 2023-11-07 /pmc/articles/PMC10662043/ /pubmed/38025398 http://dx.doi.org/10.3389/fbinf.2023.1275593 Text en Copyright © 2023 Kyritsis, Pechlivanis and Psomopoulos. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Kyritsis, Konstantinos A.
Pechlivanis, Nikolaos
Psomopoulos, Fotis
Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
title Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
title_full Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
title_fullStr Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
title_full_unstemmed Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
title_short Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
title_sort software pipelines for rna-seq, chip-seq and germline variant calling analyses in common workflow language (cwl)
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662043/
https://www.ncbi.nlm.nih.gov/pubmed/38025398
http://dx.doi.org/10.3389/fbinf.2023.1275593
work_keys_str_mv AT kyritsiskonstantinosa softwarepipelinesforrnaseqchipseqandgermlinevariantcallinganalysesincommonworkflowlanguagecwl
AT pechlivanisnikolaos softwarepipelinesforrnaseqchipseqandgermlinevariantcallinganalysesincommonworkflowlanguagecwl
AT psomopoulosfotis softwarepipelinesforrnaseqchipseqandgermlinevariantcallinganalysesincommonworkflowlanguagecwl