Cargando…

Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial

Advancements in mass spectrometry‐based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much‐needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step‐by‐step protocol for the a...

Descripción completa

Detalles Bibliográficos
Autores principales: Čuklina, Jelena, Lee, Chloe H, Williams, Evan G, Sajic, Tatjana, Collins, Ben C, Rodríguez Martínez, María, Sharma, Varun S, Wendt, Fabian, Goetze, Sandra, Keele, Gregory R, Wollscheid, Bernd, Aebersold, Ruedi, Pedrioli, Patrick G A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8447595/
https://www.ncbi.nlm.nih.gov/pubmed/34432947
http://dx.doi.org/10.15252/msb.202110240
_version_ 1784569049745719296
author Čuklina, Jelena
Lee, Chloe H
Williams, Evan G
Sajic, Tatjana
Collins, Ben C
Rodríguez Martínez, María
Sharma, Varun S
Wendt, Fabian
Goetze, Sandra
Keele, Gregory R
Wollscheid, Bernd
Aebersold, Ruedi
Pedrioli, Patrick G A
author_facet Čuklina, Jelena
Lee, Chloe H
Williams, Evan G
Sajic, Tatjana
Collins, Ben C
Rodríguez Martínez, María
Sharma, Varun S
Wendt, Fabian
Goetze, Sandra
Keele, Gregory R
Wollscheid, Bernd
Aebersold, Ruedi
Pedrioli, Patrick G A
author_sort Čuklina, Jelena
collection PubMed
description Advancements in mass spectrometry‐based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much‐needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step‐by‐step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
format Online
Article
Text
id pubmed-8447595
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-84475952021-10-06 Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial Čuklina, Jelena Lee, Chloe H Williams, Evan G Sajic, Tatjana Collins, Ben C Rodríguez Martínez, María Sharma, Varun S Wendt, Fabian Goetze, Sandra Keele, Gregory R Wollscheid, Bernd Aebersold, Ruedi Pedrioli, Patrick G A Mol Syst Biol Reviews Advancements in mass spectrometry‐based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much‐needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step‐by‐step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology. John Wiley and Sons Inc. 2021-08-25 /pmc/articles/PMC8447595/ /pubmed/34432947 http://dx.doi.org/10.15252/msb.202110240 Text en © 2021 The Authors. Published under the terms of the CC BY 4.0 license https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Reviews
Čuklina, Jelena
Lee, Chloe H
Williams, Evan G
Sajic, Tatjana
Collins, Ben C
Rodríguez Martínez, María
Sharma, Varun S
Wendt, Fabian
Goetze, Sandra
Keele, Gregory R
Wollscheid, Bernd
Aebersold, Ruedi
Pedrioli, Patrick G A
Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
title Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
title_full Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
title_fullStr Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
title_full_unstemmed Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
title_short Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
title_sort diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial
topic Reviews
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8447595/
https://www.ncbi.nlm.nih.gov/pubmed/34432947
http://dx.doi.org/10.15252/msb.202110240
work_keys_str_mv AT cuklinajelena diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT leechloeh diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT williamsevang diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT sajictatjana diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT collinsbenc diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT rodriguezmartinezmaria diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT sharmavaruns diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT wendtfabian diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT goetzesandra diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT keelegregoryr diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT wollscheidbernd diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT aebersoldruedi diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial
AT pedriolipatrickga diagnosticsandcorrectionofbatcheffectsinlargescaleproteomicstudiesatutorial