Cargando…

Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys

BACKGROUND: Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in...

Descripción completa

Detalles Bibliográficos
Autores principales: Proffitt, J. Michael, Glenn, Jeremy, Cesnik, Anthony J., Jadhav, Avinash, Shortreed, Michael R., Smith, Lloyd M., Kavanagh, Kylie, Cox, Laura A., Olivier, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5683380/
https://www.ncbi.nlm.nih.gov/pubmed/29132314
http://dx.doi.org/10.1186/s12864-017-4279-0
_version_ 1783278274010415104
author Proffitt, J. Michael
Glenn, Jeremy
Cesnik, Anthony J.
Jadhav, Avinash
Shortreed, Michael R.
Smith, Lloyd M.
Kavanagh, Kylie
Cox, Laura A.
Olivier, Michael
author_facet Proffitt, J. Michael
Glenn, Jeremy
Cesnik, Anthony J.
Jadhav, Avinash
Shortreed, Michael R.
Smith, Lloyd M.
Kavanagh, Kylie
Cox, Laura A.
Olivier, Michael
author_sort Proffitt, J. Michael
collection PubMed
description BACKGROUND: Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue. RESULTS: We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate. CONCLUSION: Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi: 10.1186/s12864-017-4279-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5683380
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56833802017-11-20 Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys Proffitt, J. Michael Glenn, Jeremy Cesnik, Anthony J. Jadhav, Avinash Shortreed, Michael R. Smith, Lloyd M. Kavanagh, Kylie Cox, Laura A. Olivier, Michael BMC Genomics Research Article BACKGROUND: Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue. RESULTS: We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate. CONCLUSION: Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi: 10.1186/s12864-017-4279-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-11-13 /pmc/articles/PMC5683380/ /pubmed/29132314 http://dx.doi.org/10.1186/s12864-017-4279-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Proffitt, J. Michael
Glenn, Jeremy
Cesnik, Anthony J.
Jadhav, Avinash
Shortreed, Michael R.
Smith, Lloyd M.
Kavanagh, Kylie
Cox, Laura A.
Olivier, Michael
Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_full Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_fullStr Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_full_unstemmed Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_short Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_sort proteomics in non-human primates: utilizing rna-seq data to improve protein identification by mass spectrometry in vervet monkeys
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5683380/
https://www.ncbi.nlm.nih.gov/pubmed/29132314
http://dx.doi.org/10.1186/s12864-017-4279-0
work_keys_str_mv AT proffittjmichael proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT glennjeremy proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT cesnikanthonyj proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT jadhavavinash proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT shortreedmichaelr proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT smithlloydm proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT kavanaghkylie proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT coxlauraa proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT oliviermichael proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys