Cargando…
Flexible Data Analysis Pipeline for High-Confidence Proteogenomics
[Image: see text] Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, autom...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical
Society
2016
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703597/ https://www.ncbi.nlm.nih.gov/pubmed/27786492 http://dx.doi.org/10.1021/acs.jproteome.6b00765 |
_version_ | 1783281714234130432 |
---|---|
author | Weisser, Hendrik Wright, James C. Mudge, Jonathan M. Gutenbrunner, Petra Choudhary, Jyoti S. |
author_facet | Weisser, Hendrik Wright, James C. Mudge, Jonathan M. Gutenbrunner, Petra Choudhary, Jyoti S. |
author_sort | Weisser, Hendrik |
collection | PubMed |
description | [Image: see text] Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such “novel” peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome. |
format | Online Article Text |
id | pubmed-5703597 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | American Chemical
Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-57035972017-11-29 Flexible Data Analysis Pipeline for High-Confidence Proteogenomics Weisser, Hendrik Wright, James C. Mudge, Jonathan M. Gutenbrunner, Petra Choudhary, Jyoti S. J Proteome Res [Image: see text] Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are “novel” peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such “novel” peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome. American Chemical Society 2016-10-27 2016-12-02 /pmc/articles/PMC5703597/ /pubmed/27786492 http://dx.doi.org/10.1021/acs.jproteome.6b00765 Text en Copyright © 2016 American Chemical Society This is an open access article published under a Creative Commons Attribution (CC-BY) License (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html) , which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited. |
spellingShingle | Weisser, Hendrik Wright, James C. Mudge, Jonathan M. Gutenbrunner, Petra Choudhary, Jyoti S. Flexible Data Analysis Pipeline for High-Confidence Proteogenomics |
title | Flexible Data
Analysis Pipeline for High-Confidence
Proteogenomics |
title_full | Flexible Data
Analysis Pipeline for High-Confidence
Proteogenomics |
title_fullStr | Flexible Data
Analysis Pipeline for High-Confidence
Proteogenomics |
title_full_unstemmed | Flexible Data
Analysis Pipeline for High-Confidence
Proteogenomics |
title_short | Flexible Data
Analysis Pipeline for High-Confidence
Proteogenomics |
title_sort | flexible data
analysis pipeline for high-confidence
proteogenomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703597/ https://www.ncbi.nlm.nih.gov/pubmed/27786492 http://dx.doi.org/10.1021/acs.jproteome.6b00765 |
work_keys_str_mv | AT weisserhendrik flexibledataanalysispipelineforhighconfidenceproteogenomics AT wrightjamesc flexibledataanalysispipelineforhighconfidenceproteogenomics AT mudgejonathanm flexibledataanalysispipelineforhighconfidenceproteogenomics AT gutenbrunnerpetra flexibledataanalysispipelineforhighconfidenceproteogenomics AT choudharyjyotis flexibledataanalysispipelineforhighconfidenceproteogenomics |