Cargando…

Genome annotation of Anopheles gambiae using mass spectrometry-derived data

BACKGROUND: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct...

Descripción completa

Detalles Bibliográficos
Autores principales: Kalume, Dário E, Peri, Suraj, Reddy, Raghunath, Zhong, Jun, Okulate, Mobolaji, Kumar, Nirbhay, Pandey, Akhilesh
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1249570/
https://www.ncbi.nlm.nih.gov/pubmed/16171517
http://dx.doi.org/10.1186/1471-2164-6-128
_version_ 1782125716242956288
author Kalume, Dário E
Peri, Suraj
Reddy, Raghunath
Zhong, Jun
Okulate, Mobolaji
Kumar, Nirbhay
Pandey, Akhilesh
author_facet Kalume, Dário E
Peri, Suraj
Reddy, Raghunath
Zhong, Jun
Okulate, Mobolaji
Kumar, Nirbhay
Pandey, Akhilesh
author_sort Kalume, Dário E
collection PubMed
description BACKGROUND: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. RESULTS: We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. CONCLUSION: The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry.
format Text
id pubmed-1249570
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-12495702005-10-08 Genome annotation of Anopheles gambiae using mass spectrometry-derived data Kalume, Dário E Peri, Suraj Reddy, Raghunath Zhong, Jun Okulate, Mobolaji Kumar, Nirbhay Pandey, Akhilesh BMC Genomics Research Article BACKGROUND: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. RESULTS: We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. CONCLUSION: The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry. BioMed Central 2005-09-19 /pmc/articles/PMC1249570/ /pubmed/16171517 http://dx.doi.org/10.1186/1471-2164-6-128 Text en Copyright © 2005 Kalume et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kalume, Dário E
Peri, Suraj
Reddy, Raghunath
Zhong, Jun
Okulate, Mobolaji
Kumar, Nirbhay
Pandey, Akhilesh
Genome annotation of Anopheles gambiae using mass spectrometry-derived data
title Genome annotation of Anopheles gambiae using mass spectrometry-derived data
title_full Genome annotation of Anopheles gambiae using mass spectrometry-derived data
title_fullStr Genome annotation of Anopheles gambiae using mass spectrometry-derived data
title_full_unstemmed Genome annotation of Anopheles gambiae using mass spectrometry-derived data
title_short Genome annotation of Anopheles gambiae using mass spectrometry-derived data
title_sort genome annotation of anopheles gambiae using mass spectrometry-derived data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1249570/
https://www.ncbi.nlm.nih.gov/pubmed/16171517
http://dx.doi.org/10.1186/1471-2164-6-128
work_keys_str_mv AT kalumedarioe genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata
AT perisuraj genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata
AT reddyraghunath genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata
AT zhongjun genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata
AT okulatemobolaji genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata
AT kumarnirbhay genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata
AT pandeyakhilesh genomeannotationofanophelesgambiaeusingmassspectrometryderiveddata