Cargando…

Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study

Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We perf...

Descripción completa

Detalles Bibliográficos
Autores principales: Venter, Eli, Smith, Richard D., Payne, Samuel H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3219674/
https://www.ncbi.nlm.nih.gov/pubmed/22114679
http://dx.doi.org/10.1371/journal.pone.0027587
_version_ 1782216874416668672
author Venter, Eli
Smith, Richard D.
Payne, Samuel H.
author_facet Venter, Eli
Smith, Richard D.
Payne, Samuel H.
author_sort Venter, Eli
collection PubMed
description Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.
format Online
Article
Text
id pubmed-3219674
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32196742011-11-23 Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study Venter, Eli Smith, Richard D. Payne, Samuel H. PLoS One Research Article Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future. Public Library of Science 2011-11-17 /pmc/articles/PMC3219674/ /pubmed/22114679 http://dx.doi.org/10.1371/journal.pone.0027587 Text en Venter et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Venter, Eli
Smith, Richard D.
Payne, Samuel H.
Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
title Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
title_full Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
title_fullStr Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
title_full_unstemmed Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
title_short Proteogenomic Analysis of Bacteria and Archaea: A 46 Organism Case Study
title_sort proteogenomic analysis of bacteria and archaea: a 46 organism case study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3219674/
https://www.ncbi.nlm.nih.gov/pubmed/22114679
http://dx.doi.org/10.1371/journal.pone.0027587
work_keys_str_mv AT ventereli proteogenomicanalysisofbacteriaandarchaeaa46organismcasestudy
AT smithrichardd proteogenomicanalysisofbacteriaandarchaeaa46organismcasestudy
AT paynesamuelh proteogenomicanalysisofbacteriaandarchaeaa46organismcasestudy