Cargando…

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics

BACKGROUND: Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of t...

Descripción completa

Detalles Bibliográficos
Autores principales: Fermin, Damian, Allen, Baxter B, Blackwell, Thomas W, Menon, Rajasree, Adamski, Marcin, Xu, Yin, Ulintz, Peter, Omenn, Gilbert S, States, David J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557991/
https://www.ncbi.nlm.nih.gov/pubmed/16646984
http://dx.doi.org/10.1186/gb-2006-7-4-r35
_version_ 1782129427904200704
author Fermin, Damian
Allen, Baxter B
Blackwell, Thomas W
Menon, Rajasree
Adamski, Marcin
Xu, Yin
Ulintz, Peter
Omenn, Gilbert S
States, David J
author_facet Fermin, Damian
Allen, Baxter B
Blackwell, Thomas W
Menon, Rajasree
Adamski, Marcin
Xu, Yin
Ulintz, Peter
Omenn, Gilbert S
States, David J
author_sort Fermin, Damian
collection PubMed
description BACKGROUND: Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. RESULTS: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. CONCLUSION: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.
format Text
id pubmed-1557991
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15579912006-09-02 Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics Fermin, Damian Allen, Baxter B Blackwell, Thomas W Menon, Rajasree Adamski, Marcin Xu, Yin Ulintz, Peter Omenn, Gilbert S States, David J Genome Biol Research BACKGROUND: Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. RESULTS: Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. CONCLUSION: This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures. BioMed Central 2006 2006-04-28 /pmc/articles/PMC1557991/ /pubmed/16646984 http://dx.doi.org/10.1186/gb-2006-7-4-r35 Text en Copyright © 2006 Fermin et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Fermin, Damian
Allen, Baxter B
Blackwell, Thomas W
Menon, Rajasree
Adamski, Marcin
Xu, Yin
Ulintz, Peter
Omenn, Gilbert S
States, David J
Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
title Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
title_full Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
title_fullStr Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
title_full_unstemmed Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
title_short Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
title_sort novel gene and gene model detection using a whole genome open reading frame analysis in proteomics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557991/
https://www.ncbi.nlm.nih.gov/pubmed/16646984
http://dx.doi.org/10.1186/gb-2006-7-4-r35
work_keys_str_mv AT fermindamian novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT allenbaxterb novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT blackwellthomasw novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT menonrajasree novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT adamskimarcin novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT xuyin novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT ulintzpeter novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT omenngilberts novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics
AT statesdavidj novelgeneandgenemodeldetectionusingawholegenomeopenreadingframeanalysisinproteomics