Cargando…

The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics

[Image: see text] The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the “C-score” (short for Characterization Score), a Bayesian approach...

Descripción completa

Detalles Bibliográficos
Autores principales: LeDuc, Richard D., Fellers, Ryan T., Early, Bryan P., Greer, Joseph B., Thomas, Paul M., Kelleher, Neil L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2014
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084843/
https://www.ncbi.nlm.nih.gov/pubmed/24922115
http://dx.doi.org/10.1021/pr401277r
_version_ 1782324574482857984
author LeDuc, Richard D.
Fellers, Ryan T.
Early, Bryan P.
Greer, Joseph B.
Thomas, Paul M.
Kelleher, Neil L.
author_facet LeDuc, Richard D.
Fellers, Ryan T.
Early, Bryan P.
Greer, Joseph B.
Thomas, Paul M.
Kelleher, Neil L.
author_sort LeDuc, Richard D.
collection PubMed
description [Image: see text] The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the “C-score” (short for Characterization Score), a Bayesian approach to the proteoform identification and characterization problem, implemented within a framework to allow the infusion of expert knowledge into generative models that take advantage of known properties of proteins and top down analytical systems (e.g., fragmentation propensities, “off-by-1 Da” discontinuous errors, and intelligent weighting for site-specific modifications). The performance of the scoring system based on the initial generative models was compared to the current probability-based scoring system used within both ProSightPC and ProSightPTM on a manually curated set of 295 human proteoforms. The current implementation of the C-score framework generated a marked improvement over the existing scoring system as measured by the area under the curve on the resulting ROC chart (AUC of 0.99 versus 0.78).
format Online
Article
Text
id pubmed-4084843
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-40848432015-06-12 The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics LeDuc, Richard D. Fellers, Ryan T. Early, Bryan P. Greer, Joseph B. Thomas, Paul M. Kelleher, Neil L. J Proteome Res [Image: see text] The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the “C-score” (short for Characterization Score), a Bayesian approach to the proteoform identification and characterization problem, implemented within a framework to allow the infusion of expert knowledge into generative models that take advantage of known properties of proteins and top down analytical systems (e.g., fragmentation propensities, “off-by-1 Da” discontinuous errors, and intelligent weighting for site-specific modifications). The performance of the scoring system based on the initial generative models was compared to the current probability-based scoring system used within both ProSightPC and ProSightPTM on a manually curated set of 295 human proteoforms. The current implementation of the C-score framework generated a marked improvement over the existing scoring system as measured by the area under the curve on the resulting ROC chart (AUC of 0.99 versus 0.78). American Chemical Society 2014-06-12 2014-07-03 /pmc/articles/PMC4084843/ /pubmed/24922115 http://dx.doi.org/10.1021/pr401277r Text en Copyright © 2014 American Chemical Society Terms of Use (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html)
spellingShingle LeDuc, Richard D.
Fellers, Ryan T.
Early, Bryan P.
Greer, Joseph B.
Thomas, Paul M.
Kelleher, Neil L.
The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics
title The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics
title_full The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics
title_fullStr The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics
title_full_unstemmed The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics
title_short The C-Score: A Bayesian Framework to Sharply Improve Proteoform Scoring in High-Throughput Top Down Proteomics
title_sort c-score: a bayesian framework to sharply improve proteoform scoring in high-throughput top down proteomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084843/
https://www.ncbi.nlm.nih.gov/pubmed/24922115
http://dx.doi.org/10.1021/pr401277r
work_keys_str_mv AT leducrichardd thecscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT fellersryant thecscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT earlybryanp thecscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT greerjosephb thecscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT thomaspaulm thecscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT kelleherneill thecscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT leducrichardd cscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT fellersryant cscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT earlybryanp cscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT greerjosephb cscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT thomaspaulm cscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics
AT kelleherneill cscoreabayesianframeworktosharplyimproveproteoformscoringinhighthroughputtopdownproteomics