Cargando…

Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation

BACKGROUND: Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accura...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Keding, Sloan, Angela, McCorrister, Stuart, Babiuk, Shawn, Bowden, Timothy R, Wang, Gehua, Knox, J David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4102332/
https://www.ncbi.nlm.nih.gov/pubmed/25011440
http://dx.doi.org/10.1186/1756-0500-7-444
_version_ 1782481030041567232
author Cheng, Keding
Sloan, Angela
McCorrister, Stuart
Babiuk, Shawn
Bowden, Timothy R
Wang, Gehua
Knox, J David
author_facet Cheng, Keding
Sloan, Angela
McCorrister, Stuart
Babiuk, Shawn
Bowden, Timothy R
Wang, Gehua
Knox, J David
author_sort Cheng, Keding
collection PubMed
description BACKGROUND: Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accurate genomic sequencing technology, public databases are being overwhelmed with new entries from different species every day. The application of these databases can also be problematic due to factors such as size, specificity, and unharmonized annotation of the molecules of interest. Current databases representing liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based searches focus on enzyme digestion patterns and sequence information and consequently, important functional information can be missed within the search output. Protein variants displaying similar sequence homology can interfere with database identification when only certain homologues are examined. In addition, recombinant DNA technology can result in products that may not be accurately annotated in public databases. Curated databases, which focus on the molecule of interest with clearer functional annotation and sequence information, are necessary for accurate protein identification and validation. Here, four cases of curated database application have been explored and summarized. FINDINGS: The four presented curated databases were constructed with clear goals regarding application and have proven very useful for targeted protein identification and biomarker application in different fields. They include a sheeppox virus database created for accurate identification of proteins with strong antigenicity, a custom database containing clearly annotated protein variants such as tau transcript variant 2 for accurate biomarker identification, a sheep-hamster chimeric prion protein (PrP) database constructed for assay development of prion diseases, and a custom Escherichia coli (E. coli) flagella (H antigen) database produced for MS-H, a new H-typing technique. Clearly annotating the proteins of interest was essential for highly accurate, specific, and sensitive sequence identification, and searching against public databases resulted in inaccurate identification of the sequence of interest, while combining the curated database with a public database reduced both the confidence and sequence coverage of the protein search. CONCLUSION: Curated protein sequence databases incorporating clear annotations are very useful for accurate protein identification and fit-for-purpose application through MS-based biomarker validation.
format Online
Article
Text
id pubmed-4102332
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41023322014-07-18 Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation Cheng, Keding Sloan, Angela McCorrister, Stuart Babiuk, Shawn Bowden, Timothy R Wang, Gehua Knox, J David BMC Res Notes Data Note BACKGROUND: Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accurate genomic sequencing technology, public databases are being overwhelmed with new entries from different species every day. The application of these databases can also be problematic due to factors such as size, specificity, and unharmonized annotation of the molecules of interest. Current databases representing liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based searches focus on enzyme digestion patterns and sequence information and consequently, important functional information can be missed within the search output. Protein variants displaying similar sequence homology can interfere with database identification when only certain homologues are examined. In addition, recombinant DNA technology can result in products that may not be accurately annotated in public databases. Curated databases, which focus on the molecule of interest with clearer functional annotation and sequence information, are necessary for accurate protein identification and validation. Here, four cases of curated database application have been explored and summarized. FINDINGS: The four presented curated databases were constructed with clear goals regarding application and have proven very useful for targeted protein identification and biomarker application in different fields. They include a sheeppox virus database created for accurate identification of proteins with strong antigenicity, a custom database containing clearly annotated protein variants such as tau transcript variant 2 for accurate biomarker identification, a sheep-hamster chimeric prion protein (PrP) database constructed for assay development of prion diseases, and a custom Escherichia coli (E. coli) flagella (H antigen) database produced for MS-H, a new H-typing technique. Clearly annotating the proteins of interest was essential for highly accurate, specific, and sensitive sequence identification, and searching against public databases resulted in inaccurate identification of the sequence of interest, while combining the curated database with a public database reduced both the confidence and sequence coverage of the protein search. CONCLUSION: Curated protein sequence databases incorporating clear annotations are very useful for accurate protein identification and fit-for-purpose application through MS-based biomarker validation. BioMed Central 2014-07-10 /pmc/articles/PMC4102332/ /pubmed/25011440 http://dx.doi.org/10.1186/1756-0500-7-444 Text en Copyright © 2014 Cheng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Data Note
Cheng, Keding
Sloan, Angela
McCorrister, Stuart
Babiuk, Shawn
Bowden, Timothy R
Wang, Gehua
Knox, J David
Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
title Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
title_full Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
title_fullStr Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
title_full_unstemmed Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
title_short Fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
title_sort fit-for-purpose curated database application in mass spectrometry-based targeted protein identification and validation
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4102332/
https://www.ncbi.nlm.nih.gov/pubmed/25011440
http://dx.doi.org/10.1186/1756-0500-7-444
work_keys_str_mv AT chengkeding fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation
AT sloanangela fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation
AT mccorristerstuart fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation
AT babiukshawn fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation
AT bowdentimothyr fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation
AT wanggehua fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation
AT knoxjdavid fitforpurposecurateddatabaseapplicationinmassspectrometrybasedtargetedproteinidentificationandvalidation