Cargando…

PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq

BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that...

Descripción completa

Detalles Bibliográficos
Autores principales: Wen, Bo, Xu, Shaohang, Zhou, Ruo, Zhang, Bing, Wang, Xiaojing, Liu, Xin, Xu, Xun, Liu, Siqi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912784/
https://www.ncbi.nlm.nih.gov/pubmed/27316337
http://dx.doi.org/10.1186/s12859-016-1133-3
_version_ 1782438325796208640
author Wen, Bo
Xu, Shaohang
Zhou, Ruo
Zhang, Bing
Wang, Xiaojing
Liu, Xin
Xu, Xun
Liu, Siqi
author_facet Wen, Bo
Xu, Shaohang
Zhou, Ruo
Zhang, Bing
Wang, Xiaojing
Liu, Xin
Xu, Xun
Liu, Siqi
author_sort Wen, Bo
collection PubMed
description BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. RESULTS: A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/. CONCLUSIONS: The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1133-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4912784
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49127842016-06-20 PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq Wen, Bo Xu, Shaohang Zhou, Ruo Zhang, Bing Wang, Xiaojing Liu, Xin Xu, Xun Liu, Siqi BMC Bioinformatics Software BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. RESULTS: A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/. CONCLUSIONS: The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1133-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-17 /pmc/articles/PMC4912784/ /pubmed/27316337 http://dx.doi.org/10.1186/s12859-016-1133-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Wen, Bo
Xu, Shaohang
Zhou, Ruo
Zhang, Bing
Wang, Xiaojing
Liu, Xin
Xu, Xun
Liu, Siqi
PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
title PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
title_full PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
title_fullStr PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
title_full_unstemmed PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
title_short PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
title_sort pga: an r/bioconductor package for identification of novel peptides using a customized database derived from rna-seq
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912784/
https://www.ncbi.nlm.nih.gov/pubmed/27316337
http://dx.doi.org/10.1186/s12859-016-1133-3
work_keys_str_mv AT wenbo pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT xushaohang pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT zhouruo pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT zhangbing pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT wangxiaojing pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT liuxin pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT xuxun pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq
AT liusiqi pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq