Cargando…

PGCA: An algorithm to link protein groups created from MS/MS data

The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into p...

Descripción completa

Detalles Bibliográficos
Autores principales: Kepplinger, David, Takhar, Mandeep, Sasaki, Mayu, Hollander, Zsuzsanna, Smith, Derek, McManus, Bruce, McMaster, W. Robert, Ng, Raymond T., Cohen Freue, Gabriela V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5451011/
https://www.ncbi.nlm.nih.gov/pubmed/28562641
http://dx.doi.org/10.1371/journal.pone.0177569
_version_ 1783240096769638400
author Kepplinger, David
Takhar, Mandeep
Sasaki, Mayu
Hollander, Zsuzsanna
Smith, Derek
McManus, Bruce
McMaster, W. Robert
Ng, Raymond T.
Cohen Freue, Gabriela V.
author_facet Kepplinger, David
Takhar, Mandeep
Sasaki, Mayu
Hollander, Zsuzsanna
Smith, Derek
McManus, Bruce
McMaster, W. Robert
Ng, Raymond T.
Cohen Freue, Gabriela V.
author_sort Kepplinger, David
collection PubMed
description The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into peptides, which are then separated by liquid chromatography and sequenced by tandem mass spectrometry (MS/MS). At the end of the workflow, recovering the identity of the proteins originally present in the sample is often a difficult and ambiguous process, because more than one protein identifier may match a set of peptides identified from the MS/MS spectra. To address this identification problem, many MS/MS data processing software tools combine all plausible protein identifiers matching a common set of peptides into a protein group. However, this solution introduces new challenges in studies with multiple experimental runs, which can be characterized by three main factors: i) protein groups’ identifiers are local, i.e., they vary run to run, ii) the composition of each group may change across runs, and iii) the supporting evidence of proteins within each group may also change across runs. Since in general there is no conclusive evidence about the absence of proteins in the groups, protein groups need to be linked across different runs in subsequent statistical analyses. We propose an algorithm, called Protein Group Code Algorithm (PGCA), to link groups from multiple experimental runs by forming global protein groups from connected local groups. The algorithm is computationally inexpensive and enables the connection and analysis of lists of protein groups across runs needed in biomarkers studies. We illustrate the identification problem and the stability of the PGCA mapping using 65 iTRAQ experimental runs. Further, we use two biomarker studies to show how PGCA enables the discovery of relevant candidate protein group markers with similar but non-identical compositions in different runs.
format Online
Article
Text
id pubmed-5451011
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-54510112017-06-12 PGCA: An algorithm to link protein groups created from MS/MS data Kepplinger, David Takhar, Mandeep Sasaki, Mayu Hollander, Zsuzsanna Smith, Derek McManus, Bruce McMaster, W. Robert Ng, Raymond T. Cohen Freue, Gabriela V. PLoS One Research Article The quantitation of proteins using shotgun proteomics has gained popularity in the last decades, simplifying sample handling procedures, removing extensive protein separation steps and achieving a relatively high throughput readout. The process starts with the digestion of the protein mixture into peptides, which are then separated by liquid chromatography and sequenced by tandem mass spectrometry (MS/MS). At the end of the workflow, recovering the identity of the proteins originally present in the sample is often a difficult and ambiguous process, because more than one protein identifier may match a set of peptides identified from the MS/MS spectra. To address this identification problem, many MS/MS data processing software tools combine all plausible protein identifiers matching a common set of peptides into a protein group. However, this solution introduces new challenges in studies with multiple experimental runs, which can be characterized by three main factors: i) protein groups’ identifiers are local, i.e., they vary run to run, ii) the composition of each group may change across runs, and iii) the supporting evidence of proteins within each group may also change across runs. Since in general there is no conclusive evidence about the absence of proteins in the groups, protein groups need to be linked across different runs in subsequent statistical analyses. We propose an algorithm, called Protein Group Code Algorithm (PGCA), to link groups from multiple experimental runs by forming global protein groups from connected local groups. The algorithm is computationally inexpensive and enables the connection and analysis of lists of protein groups across runs needed in biomarkers studies. We illustrate the identification problem and the stability of the PGCA mapping using 65 iTRAQ experimental runs. Further, we use two biomarker studies to show how PGCA enables the discovery of relevant candidate protein group markers with similar but non-identical compositions in different runs. Public Library of Science 2017-05-31 /pmc/articles/PMC5451011/ /pubmed/28562641 http://dx.doi.org/10.1371/journal.pone.0177569 Text en © 2017 Kepplinger et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kepplinger, David
Takhar, Mandeep
Sasaki, Mayu
Hollander, Zsuzsanna
Smith, Derek
McManus, Bruce
McMaster, W. Robert
Ng, Raymond T.
Cohen Freue, Gabriela V.
PGCA: An algorithm to link protein groups created from MS/MS data
title PGCA: An algorithm to link protein groups created from MS/MS data
title_full PGCA: An algorithm to link protein groups created from MS/MS data
title_fullStr PGCA: An algorithm to link protein groups created from MS/MS data
title_full_unstemmed PGCA: An algorithm to link protein groups created from MS/MS data
title_short PGCA: An algorithm to link protein groups created from MS/MS data
title_sort pgca: an algorithm to link protein groups created from ms/ms data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5451011/
https://www.ncbi.nlm.nih.gov/pubmed/28562641
http://dx.doi.org/10.1371/journal.pone.0177569
work_keys_str_mv AT kepplingerdavid pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT takharmandeep pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT sasakimayu pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT hollanderzsuzsanna pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT smithderek pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT mcmanusbruce pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT mcmasterwrobert pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT ngraymondt pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata
AT cohenfreuegabrielav pgcaanalgorithmtolinkproteingroupscreatedfrommsmsdata