Cargando…
Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations
BACKGROUND: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that ar...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158061/ https://www.ncbi.nlm.nih.gov/pubmed/25149441 http://dx.doi.org/10.1186/1471-2164-15-703 |
_version_ | 1782333973050949632 |
---|---|
author | Sheynkman, Gloria M Johnson, James E Jagtap, Pratik D Shortreed, Michael R Onsongo, Getiria Frey, Brian L Griffin, Timothy J Smith, Lloyd M |
author_facet | Sheynkman, Gloria M Johnson, James E Jagtap, Pratik D Shortreed, Michael R Onsongo, Getiria Frey, Brian L Griffin, Timothy J Smith, Lloyd M |
author_sort | Sheynkman, Gloria M |
collection | PubMed |
description | BACKGROUND: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. RESULTS: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). CONCLUSIONS: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-703) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4158061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41580612014-09-19 Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations Sheynkman, Gloria M Johnson, James E Jagtap, Pratik D Shortreed, Michael R Onsongo, Getiria Frey, Brian L Griffin, Timothy J Smith, Lloyd M BMC Genomics Methodology Article BACKGROUND: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. RESULTS: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). CONCLUSIONS: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-703) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-22 /pmc/articles/PMC4158061/ /pubmed/25149441 http://dx.doi.org/10.1186/1471-2164-15-703 Text en © Sheynkman et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Sheynkman, Gloria M Johnson, James E Jagtap, Pratik D Shortreed, Michael R Onsongo, Getiria Frey, Brian L Griffin, Timothy J Smith, Lloyd M Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations |
title | Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations |
title_full | Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations |
title_fullStr | Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations |
title_full_unstemmed | Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations |
title_short | Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations |
title_sort | using galaxy-p to leverage rna-seq for the discovery of novel protein variations |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158061/ https://www.ncbi.nlm.nih.gov/pubmed/25149441 http://dx.doi.org/10.1186/1471-2164-15-703 |
work_keys_str_mv | AT sheynkmangloriam usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT johnsonjamese usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT jagtappratikd usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT shortreedmichaelr usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT onsongogetiria usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT freybrianl usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT griffintimothyj usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations AT smithlloydm usinggalaxyptoleveragernaseqforthediscoveryofnovelproteinvariations |