Cargando…
PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912784/ https://www.ncbi.nlm.nih.gov/pubmed/27316337 http://dx.doi.org/10.1186/s12859-016-1133-3 |
_version_ | 1782438325796208640 |
---|---|
author | Wen, Bo Xu, Shaohang Zhou, Ruo Zhang, Bing Wang, Xiaojing Liu, Xin Xu, Xun Liu, Siqi |
author_facet | Wen, Bo Xu, Shaohang Zhou, Ruo Zhang, Bing Wang, Xiaojing Liu, Xin Xu, Xun Liu, Siqi |
author_sort | Wen, Bo |
collection | PubMed |
description | BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. RESULTS: A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/. CONCLUSIONS: The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1133-3) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4912784 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49127842016-06-20 PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq Wen, Bo Xu, Shaohang Zhou, Ruo Zhang, Bing Wang, Xiaojing Liu, Xin Xu, Xun Liu, Siqi BMC Bioinformatics Software BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. RESULTS: A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/, and the example reports are available at http://wenbostar.github.io/PGA/. CONCLUSIONS: The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1133-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-17 /pmc/articles/PMC4912784/ /pubmed/27316337 http://dx.doi.org/10.1186/s12859-016-1133-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Wen, Bo Xu, Shaohang Zhou, Ruo Zhang, Bing Wang, Xiaojing Liu, Xin Xu, Xun Liu, Siqi PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq |
title | PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq |
title_full | PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq |
title_fullStr | PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq |
title_full_unstemmed | PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq |
title_short | PGA: an R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq |
title_sort | pga: an r/bioconductor package for identification of novel peptides using a customized database derived from rna-seq |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4912784/ https://www.ncbi.nlm.nih.gov/pubmed/27316337 http://dx.doi.org/10.1186/s12859-016-1133-3 |
work_keys_str_mv | AT wenbo pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT xushaohang pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT zhouruo pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT zhangbing pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT wangxiaojing pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT liuxin pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT xuxun pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq AT liusiqi pgaanrbioconductorpackageforidentificationofnovelpeptidesusingacustomizeddatabasederivedfromrnaseq |