Cargando…

Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database

Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic v...

Descripción completa

Detalles Bibliográficos
Autores principales: de Souza, Gustavo A., Arntzen, Magnus Ø., Fortuin, Suereta, Schürch, Anita C., Målen, Hiwa, McEvoy, Christopher R. E., van Soolingen, Dick, Thiede, Bernd, Warren, Robin M., Wiker, Harald G.
Formato: Texto
Lenguaje:English
Publicado: The American Society for Biochemistry and Molecular Biology 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013451/
https://www.ncbi.nlm.nih.gov/pubmed/21030493
http://dx.doi.org/10.1074/mcp.M110.002527
_version_ 1782195279096709120
author de Souza, Gustavo A.
Arntzen, Magnus Ø.
Fortuin, Suereta
Schürch, Anita C.
Målen, Hiwa
McEvoy, Christopher R. E.
van Soolingen, Dick
Thiede, Bernd
Warren, Robin M.
Wiker, Harald G.
author_facet de Souza, Gustavo A.
Arntzen, Magnus Ø.
Fortuin, Suereta
Schürch, Anita C.
Målen, Hiwa
McEvoy, Christopher R. E.
van Soolingen, Dick
Thiede, Bernd
Warren, Robin M.
Wiker, Harald G.
author_sort de Souza, Gustavo A.
collection PubMed
description Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic variations that may not be represented in the selected database. We recently developed software called multistrain mass spectrometry prokaryotic database builder (MSMSpdbb) that can merge protein databases from several sources and be applied on any prokaryotic organism, in a proteomic-friendly approach. We generated a database for the Mycobacterium tuberculosis complex (using three strains of Mycobacterium bovis and five of M. tuberculosis), and analyzed data collected from two laboratory strains and two clinical isolates of M. tuberculosis. We identified 2561 proteins, of which 24 were present in M. tuberculosis H37Rv samples, but not annotated in the M. tuberculosis H37Rv genome. We were also able to identify 280 nonsynonymous single amino acid polymorphisms and confirm 367 translational start sites. As a proof of concept we applied the database to whole-genome DNA sequencing data of one of the clinical isolates, which allowed the validation of 116 predicted single amino acid polymorphisms and the annotation of 131 N-terminal start sites. Moreover we identified regions not present in the original M. tuberculosis H37Rv sequence, indicating strain divergence or errors in the reference sequence. In conclusion, we demonstrated the potential of using a merged database to better characterize laboratory or clinical bacterial strains.
format Text
id pubmed-3013451
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher The American Society for Biochemistry and Molecular Biology
record_format MEDLINE/PubMed
spelling pubmed-30134512011-01-18 Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database de Souza, Gustavo A. Arntzen, Magnus Ø. Fortuin, Suereta Schürch, Anita C. Målen, Hiwa McEvoy, Christopher R. E. van Soolingen, Dick Thiede, Bernd Warren, Robin M. Wiker, Harald G. Mol Cell Proteomics Research Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic variations that may not be represented in the selected database. We recently developed software called multistrain mass spectrometry prokaryotic database builder (MSMSpdbb) that can merge protein databases from several sources and be applied on any prokaryotic organism, in a proteomic-friendly approach. We generated a database for the Mycobacterium tuberculosis complex (using three strains of Mycobacterium bovis and five of M. tuberculosis), and analyzed data collected from two laboratory strains and two clinical isolates of M. tuberculosis. We identified 2561 proteins, of which 24 were present in M. tuberculosis H37Rv samples, but not annotated in the M. tuberculosis H37Rv genome. We were also able to identify 280 nonsynonymous single amino acid polymorphisms and confirm 367 translational start sites. As a proof of concept we applied the database to whole-genome DNA sequencing data of one of the clinical isolates, which allowed the validation of 116 predicted single amino acid polymorphisms and the annotation of 131 N-terminal start sites. Moreover we identified regions not present in the original M. tuberculosis H37Rv sequence, indicating strain divergence or errors in the reference sequence. In conclusion, we demonstrated the potential of using a merged database to better characterize laboratory or clinical bacterial strains. The American Society for Biochemistry and Molecular Biology 2011-01 2010-10-28 /pmc/articles/PMC3013451/ /pubmed/21030493 http://dx.doi.org/10.1074/mcp.M110.002527 Text en © 2011 by The American Society for Biochemistry and Molecular Biology, Inc. Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) applies to Author Choice Articles
spellingShingle Research
de Souza, Gustavo A.
Arntzen, Magnus Ø.
Fortuin, Suereta
Schürch, Anita C.
Målen, Hiwa
McEvoy, Christopher R. E.
van Soolingen, Dick
Thiede, Bernd
Warren, Robin M.
Wiker, Harald G.
Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database
title Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database
title_full Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database
title_fullStr Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database
title_full_unstemmed Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database
title_short Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database
title_sort proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013451/
https://www.ncbi.nlm.nih.gov/pubmed/21030493
http://dx.doi.org/10.1074/mcp.M110.002527
work_keys_str_mv AT desouzagustavoa proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT arntzenmagnusø proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT fortuinsuereta proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT schurchanitac proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT malenhiwa proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT mcevoychristopherre proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT vansoolingendick proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT thiedebernd proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT warrenrobinm proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase
AT wikerharaldg proteogenomicanalysisofpolymorphismsandgeneannotationdivergencesinprokaryotesusingaclusteredmassspectrometryfriendlydatabase