Cargando…

Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions

BACKGROUND: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mappin...

Descripción completa

Detalles Bibliográficos
Autores principales: Khatun, Jainab, Yu, Yanbao, Wrobel, John A, Risk, Brian A, Gunawardena, Harsha P, Secrest, Ashley, Spitzer, Wendy J, Xie, Ling, Wang, Li, Chen, Xian, Giddings, Morgan C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3607840/
https://www.ncbi.nlm.nih.gov/pubmed/23448259
http://dx.doi.org/10.1186/1471-2164-14-141
_version_ 1782264151215702016
author Khatun, Jainab
Yu, Yanbao
Wrobel, John A
Risk, Brian A
Gunawardena, Harsha P
Secrest, Ashley
Spitzer, Wendy J
Xie, Ling
Wang, Li
Chen, Xian
Giddings, Morgan C
author_facet Khatun, Jainab
Yu, Yanbao
Wrobel, John A
Risk, Brian A
Gunawardena, Harsha P
Secrest, Ashley
Spitzer, Wendy J
Xie, Ling
Wang, Li
Chen, Xian
Giddings, Morgan C
author_sort Khatun, Jainab
collection PubMed
description BACKGROUND: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome. RESULTS: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt. CONCLUSIONS: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.
format Online
Article
Text
id pubmed-3607840
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36078402013-03-27 Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions Khatun, Jainab Yu, Yanbao Wrobel, John A Risk, Brian A Gunawardena, Harsha P Secrest, Ashley Spitzer, Wendy J Xie, Ling Wang, Li Chen, Xian Giddings, Morgan C BMC Genomics Research Article BACKGROUND: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome. RESULTS: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt. CONCLUSIONS: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches. BioMed Central 2013-02-28 /pmc/articles/PMC3607840/ /pubmed/23448259 http://dx.doi.org/10.1186/1471-2164-14-141 Text en Copyright ©2013 Khatun et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Khatun, Jainab
Yu, Yanbao
Wrobel, John A
Risk, Brian A
Gunawardena, Harsha P
Secrest, Ashley
Spitzer, Wendy J
Xie, Ling
Wang, Li
Chen, Xian
Giddings, Morgan C
Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
title Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
title_full Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
title_fullStr Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
title_full_unstemmed Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
title_short Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions
title_sort whole human genome proteogenomic mapping for encode cell line data: identifying protein-coding regions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3607840/
https://www.ncbi.nlm.nih.gov/pubmed/23448259
http://dx.doi.org/10.1186/1471-2164-14-141
work_keys_str_mv AT khatunjainab wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT yuyanbao wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT wrobeljohna wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT riskbriana wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT gunawardenaharshap wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT secrestashley wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT spitzerwendyj wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT xieling wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT wangli wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT chenxian wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions
AT giddingsmorganc wholehumangenomeproteogenomicmappingforencodecelllinedataidentifyingproteincodingregions