Cargando…

Flexible Data Analysis Pipeline for High-Confidence Proteogenomics

[Image: see text] Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, autom...

Descripción completa

Detalles Bibliográficos
Autores principales: Weisser, Hendrik, Wright, James C., Mudge, Jonathan M., Gutenbrunner, Petra, Choudhary, Jyoti S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2016
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703597/
https://www.ncbi.nlm.nih.gov/pubmed/27786492
http://dx.doi.org/10.1021/acs.jproteome.6b00765
_version_ 1783281714234130432
author Weisser, Hendrik
Wright, James C.
Mudge, Jonathan M.
Gutenbrunner, Petra
Choudhary, Jyoti S.
author_facet Weisser, Hendrik
Wright, James C.
Mudge, Jonathan M.
Gutenbrunner, Petra
Choudhary, Jyoti S.
author_sort Weisser, Hendrik
collection PubMed
description [Image: see text] Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such “novel” peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome.
format Online
Article
Text
id pubmed-5703597
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-57035972017-11-29 Flexible Data Analysis Pipeline for High-Confidence Proteogenomics Weisser, Hendrik Wright, James C. Mudge, Jonathan M. Gutenbrunner, Petra Choudhary, Jyoti S. J Proteome Res [Image: see text] Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such “novel” peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome. American Chemical Society 2016-10-27 2016-12-02 /pmc/articles/PMC5703597/ /pubmed/27786492 http://dx.doi.org/10.1021/acs.jproteome.6b00765 Text en Copyright © 2016 American Chemical Society This is an open access article published under a Creative Commons Attribution (CC-BY) License (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html) , which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited.
spellingShingle Weisser, Hendrik
Wright, James C.
Mudge, Jonathan M.
Gutenbrunner, Petra
Choudhary, Jyoti S.
Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
title Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
title_full Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
title_fullStr Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
title_full_unstemmed Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
title_short Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
title_sort flexible data analysis pipeline for high-confidence proteogenomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703597/
https://www.ncbi.nlm.nih.gov/pubmed/27786492
http://dx.doi.org/10.1021/acs.jproteome.6b00765
work_keys_str_mv AT weisserhendrik flexibledataanalysispipelineforhighconfidenceproteogenomics
AT wrightjamesc flexibledataanalysispipelineforhighconfidenceproteogenomics
AT mudgejonathanm flexibledataanalysispipelineforhighconfidenceproteogenomics
AT gutenbrunnerpetra flexibledataanalysispipelineforhighconfidenceproteogenomics
AT choudharyjyotis flexibledataanalysispipelineforhighconfidenceproteogenomics