Cargando…

CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms

[Image: see text] Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases of nonmodel organisms is lacking. Existing methods either do not have contamination contro...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Qingxiang, Li, Dan, Zhai, Yanhua, Gu, Zemao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2020
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331046/
https://www.ncbi.nlm.nih.gov/pubmed/32637811
http://dx.doi.org/10.1021/acsomega.0c01278
_version_ 1783553241617793024
author Guo, Qingxiang
Li, Dan
Zhai, Yanhua
Gu, Zemao
author_facet Guo, Qingxiang
Li, Dan
Zhai, Yanhua
Gu, Zemao
author_sort Guo, Qingxiang
collection PubMed
description [Image: see text] Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases of nonmodel organisms is lacking. Existing methods either do not have contamination control procedures, or these methods rely on a three-frame and/or six-frame translation that sharply increases the search space and the need for computational resources. Herein, we propose a framework for constructing a customized comprehensive proteomic reference database (CCPRD) from draft genomes and deep sequencing transcriptomes. Its effectiveness is demonstrated by incorporating the proteomes of nematocysts from endoparasitic cnidarian: myxozoans. By applying customized contamination removal procedures, contaminations in omic data were successfully identified and removed. This is an effective method that does not result in overdecontamination. This can be shown by comparing the CCPRD MS results with an artificially contaminated database and another database with removed contaminations in genomes and transcriptomes added back. CCPRD outperformed traditional frame-based methods by identifying 35.2–50.7% more peptides and 35.8–43.8% more proteins, with a maximum of 84.6% in size reduction. A BUSCO analysis showed that the CCPRD maintained a relatively high level of completeness compared to traditional methods. These results confirm the superiority of the CCPRD over existing methods in peptide and protein identification numbers, database size, and completeness. By providing a general framework for generating the reference database, the CCPRD, which does not need a high-quality genome, can potentially be applied to nonmodel organisms and significantly contribute to proteomic research.
format Online
Article
Text
id pubmed-7331046
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-73310462020-07-06 CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms Guo, Qingxiang Li, Dan Zhai, Yanhua Gu, Zemao ACS Omega [Image: see text] Protein reference databases are a critical part of producing efficient proteomic analyses. However, the method for constructing clean, efficient, and comprehensive protein reference databases of nonmodel organisms is lacking. Existing methods either do not have contamination control procedures, or these methods rely on a three-frame and/or six-frame translation that sharply increases the search space and the need for computational resources. Herein, we propose a framework for constructing a customized comprehensive proteomic reference database (CCPRD) from draft genomes and deep sequencing transcriptomes. Its effectiveness is demonstrated by incorporating the proteomes of nematocysts from endoparasitic cnidarian: myxozoans. By applying customized contamination removal procedures, contaminations in omic data were successfully identified and removed. This is an effective method that does not result in overdecontamination. This can be shown by comparing the CCPRD MS results with an artificially contaminated database and another database with removed contaminations in genomes and transcriptomes added back. CCPRD outperformed traditional frame-based methods by identifying 35.2–50.7% more peptides and 35.8–43.8% more proteins, with a maximum of 84.6% in size reduction. A BUSCO analysis showed that the CCPRD maintained a relatively high level of completeness compared to traditional methods. These results confirm the superiority of the CCPRD over existing methods in peptide and protein identification numbers, database size, and completeness. By providing a general framework for generating the reference database, the CCPRD, which does not need a high-quality genome, can potentially be applied to nonmodel organisms and significantly contribute to proteomic research. American Chemical Society 2020-06-17 /pmc/articles/PMC7331046/ /pubmed/32637811 http://dx.doi.org/10.1021/acsomega.0c01278 Text en Copyright © 2020 American Chemical Society This is an open access article published under an ACS AuthorChoice License (http://pubs.acs.org/page/policy/authorchoice_termsofuse.html) , which permits copying and redistribution of the article or any adaptations for non-commercial purposes.
spellingShingle Guo, Qingxiang
Li, Dan
Zhai, Yanhua
Gu, Zemao
CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms
title CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms
title_full CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms
title_fullStr CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms
title_full_unstemmed CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms
title_short CCPRD: A Novel Analytical Framework for the Comprehensive Proteomic Reference Database Construction of NonModel Organisms
title_sort ccprd: a novel analytical framework for the comprehensive proteomic reference database construction of nonmodel organisms
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7331046/
https://www.ncbi.nlm.nih.gov/pubmed/32637811
http://dx.doi.org/10.1021/acsomega.0c01278
work_keys_str_mv AT guoqingxiang ccprdanovelanalyticalframeworkforthecomprehensiveproteomicreferencedatabaseconstructionofnonmodelorganisms
AT lidan ccprdanovelanalyticalframeworkforthecomprehensiveproteomicreferencedatabaseconstructionofnonmodelorganisms
AT zhaiyanhua ccprdanovelanalyticalframeworkforthecomprehensiveproteomicreferencedatabaseconstructionofnonmodelorganisms
AT guzemao ccprdanovelanalyticalframeworkforthecomprehensiveproteomicreferencedatabaseconstructionofnonmodelorganisms