Cargando…
Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)
Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germlin...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662043/ https://www.ncbi.nlm.nih.gov/pubmed/38025398 http://dx.doi.org/10.3389/fbinf.2023.1275593 |
_version_ | 1785138118793363456 |
---|---|
author | Kyritsis, Konstantinos A. Pechlivanis, Nikolaos Psomopoulos, Fotis |
author_facet | Kyritsis, Konstantinos A. Pechlivanis, Nikolaos Psomopoulos, Fotis |
author_sort | Kyritsis, Konstantinos A. |
collection | PubMed |
description | Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses. |
format | Online Article Text |
id | pubmed-10662043 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-106620432023-11-07 Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) Kyritsis, Konstantinos A. Pechlivanis, Nikolaos Psomopoulos, Fotis Front Bioinform Bioinformatics Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses. Frontiers Media S.A. 2023-11-07 /pmc/articles/PMC10662043/ /pubmed/38025398 http://dx.doi.org/10.3389/fbinf.2023.1275593 Text en Copyright © 2023 Kyritsis, Pechlivanis and Psomopoulos. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Kyritsis, Konstantinos A. Pechlivanis, Nikolaos Psomopoulos, Fotis Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) |
title | Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) |
title_full | Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) |
title_fullStr | Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) |
title_full_unstemmed | Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) |
title_short | Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) |
title_sort | software pipelines for rna-seq, chip-seq and germline variant calling analyses in common workflow language (cwl) |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662043/ https://www.ncbi.nlm.nih.gov/pubmed/38025398 http://dx.doi.org/10.3389/fbinf.2023.1275593 |
work_keys_str_mv | AT kyritsiskonstantinosa softwarepipelinesforrnaseqchipseqandgermlinevariantcallinganalysesincommonworkflowlanguagecwl AT pechlivanisnikolaos softwarepipelinesforrnaseqchipseqandgermlinevariantcallinganalysesincommonworkflowlanguagecwl AT psomopoulosfotis softwarepipelinesforrnaseqchipseqandgermlinevariantcallinganalysesincommonworkflowlanguagecwl |