Cargando…

Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework

BACKGROUND: Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and larg...

Descripción completa

Detalles Bibliográficos
Autores principales: Fahrner, Matthias, Föll, Melanie Christine, Grüning, Björn Andreas, Bernt, Matthias, Röst, Hannes, Schilling, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8848309/
https://www.ncbi.nlm.nih.gov/pubmed/35166338
http://dx.doi.org/10.1093/gigascience/giac005
_version_ 1784652221857660928
author Fahrner, Matthias
Föll, Melanie Christine
Grüning, Björn Andreas
Bernt, Matthias
Röst, Hannes
Schilling, Oliver
author_facet Fahrner, Matthias
Föll, Melanie Christine
Grüning, Björn Andreas
Bernt, Matthias
Röst, Hannes
Schilling, Oliver
author_sort Fahrner, Matthias
collection PubMed
description BACKGROUND: Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. FINDINGS: To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. CONCLUSION: The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.
format Online
Article
Text
id pubmed-8848309
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88483092022-02-17 Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework Fahrner, Matthias Föll, Melanie Christine Grüning, Björn Andreas Bernt, Matthias Röst, Hannes Schilling, Oliver Gigascience Technical Note BACKGROUND: Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. FINDINGS: To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. CONCLUSION: The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis. Oxford University Press 2022-02-15 /pmc/articles/PMC8848309/ /pubmed/35166338 http://dx.doi.org/10.1093/gigascience/giac005 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Fahrner, Matthias
Föll, Melanie Christine
Grüning, Björn Andreas
Bernt, Matthias
Röst, Hannes
Schilling, Oliver
Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework
title Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework
title_full Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework
title_fullStr Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework
title_full_unstemmed Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework
title_short Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework
title_sort democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the galaxy framework
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8848309/
https://www.ncbi.nlm.nih.gov/pubmed/35166338
http://dx.doi.org/10.1093/gigascience/giac005
work_keys_str_mv AT fahrnermatthias democratizingdataindependentacquisitionproteomicsanalysisonpubliccloudinfrastructuresviathegalaxyframework
AT follmelaniechristine democratizingdataindependentacquisitionproteomicsanalysisonpubliccloudinfrastructuresviathegalaxyframework
AT gruningbjornandreas democratizingdataindependentacquisitionproteomicsanalysisonpubliccloudinfrastructuresviathegalaxyframework
AT berntmatthias democratizingdataindependentacquisitionproteomicsanalysisonpubliccloudinfrastructuresviathegalaxyframework
AT rosthannes democratizingdataindependentacquisitionproteomicsanalysisonpubliccloudinfrastructuresviathegalaxyframework
AT schillingoliver democratizingdataindependentacquisitionproteomicsanalysisonpubliccloudinfrastructuresviathegalaxyframework