Cargando…

Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods

In this paper, we performed a comprehensive re-annotation of protein-coding genes by a systematic method combining composition- and similarity-based approaches in 10 complete bacterial genomes of the family Neisseriaceae. First, 418 hypothetical genes were predicted as non-coding using the compositi...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Feng-Biao, Xiong, Lifeng, Teng, Jade L. L., Yuen, Kwok-Yung, Lau, Susanna K. P., Woo, Patrick C. Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3686433/
https://www.ncbi.nlm.nih.gov/pubmed/23571676
http://dx.doi.org/10.1093/dnares/dst009
_version_ 1782273790102732800
author Guo, Feng-Biao
Xiong, Lifeng
Teng, Jade L. L.
Yuen, Kwok-Yung
Lau, Susanna K. P.
Woo, Patrick C. Y.
author_facet Guo, Feng-Biao
Xiong, Lifeng
Teng, Jade L. L.
Yuen, Kwok-Yung
Lau, Susanna K. P.
Woo, Patrick C. Y.
author_sort Guo, Feng-Biao
collection PubMed
description In this paper, we performed a comprehensive re-annotation of protein-coding genes by a systematic method combining composition- and similarity-based approaches in 10 complete bacterial genomes of the family Neisseriaceae. First, 418 hypothetical genes were predicted as non-coding using the composition-based method and 413 were eliminated from the gene list. Both the scatter plot and cluster of orthologous groups (COG) fraction analyses supported the result. Second, from 20 to 400 hypothetical proteins were assigned with functions in each of the 10 strains based on the homology search. Among newly assigned functions, 397 are so detailed to have definite gene names. Third, 106 genes missed by the original annotations were picked up by an ab initio gene finder combined with similarity alignment. Transcriptional experiments validated the effectiveness of this method in Laribacter hongkongensis and Chromobacterium violaceum. Among the 106 newly found genes, some deserve particular interests. For example, 27 transposases were newly found in Neiserria meningitidis alpha14. In Neiserria gonorrhoeae NCCP11945, four new genes with putative functions and definite names (nusG, rpsN, rpmD and infA) were found and homologues of them usually are essential for survival in bacteria. The updated annotations for the 10 Neisseriaceae genomes provide a more accurate prediction of protein-coding genes and a more detailed functional information of hypothetical proteins. It will benefit research into the lifestyle, metabolism, environmental adaption and pathogenicity of the Neisseriaceae species. The re-annotation procedure could be used directly, or after the adaption of detailed methods, for checking annotations of any other bacterial or archaeal genomes.
format Online
Article
Text
id pubmed-3686433
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36864332013-06-19 Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods Guo, Feng-Biao Xiong, Lifeng Teng, Jade L. L. Yuen, Kwok-Yung Lau, Susanna K. P. Woo, Patrick C. Y. DNA Res Full Papers In this paper, we performed a comprehensive re-annotation of protein-coding genes by a systematic method combining composition- and similarity-based approaches in 10 complete bacterial genomes of the family Neisseriaceae. First, 418 hypothetical genes were predicted as non-coding using the composition-based method and 413 were eliminated from the gene list. Both the scatter plot and cluster of orthologous groups (COG) fraction analyses supported the result. Second, from 20 to 400 hypothetical proteins were assigned with functions in each of the 10 strains based on the homology search. Among newly assigned functions, 397 are so detailed to have definite gene names. Third, 106 genes missed by the original annotations were picked up by an ab initio gene finder combined with similarity alignment. Transcriptional experiments validated the effectiveness of this method in Laribacter hongkongensis and Chromobacterium violaceum. Among the 106 newly found genes, some deserve particular interests. For example, 27 transposases were newly found in Neiserria meningitidis alpha14. In Neiserria gonorrhoeae NCCP11945, four new genes with putative functions and definite names (nusG, rpsN, rpmD and infA) were found and homologues of them usually are essential for survival in bacteria. The updated annotations for the 10 Neisseriaceae genomes provide a more accurate prediction of protein-coding genes and a more detailed functional information of hypothetical proteins. It will benefit research into the lifestyle, metabolism, environmental adaption and pathogenicity of the Neisseriaceae species. The re-annotation procedure could be used directly, or after the adaption of detailed methods, for checking annotations of any other bacterial or archaeal genomes. Oxford University Press 2013-06 2013-04-09 /pmc/articles/PMC3686433/ /pubmed/23571676 http://dx.doi.org/10.1093/dnares/dst009 Text en © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle Full Papers
Guo, Feng-Biao
Xiong, Lifeng
Teng, Jade L. L.
Yuen, Kwok-Yung
Lau, Susanna K. P.
Woo, Patrick C. Y.
Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
title Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
title_full Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
title_fullStr Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
title_full_unstemmed Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
title_short Re-Annotation of Protein-Coding Genes in 10 Complete Genomes of Neisseriaceae Family by Combining Similarity-Based and Composition-Based Methods
title_sort re-annotation of protein-coding genes in 10 complete genomes of neisseriaceae family by combining similarity-based and composition-based methods
topic Full Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3686433/
https://www.ncbi.nlm.nih.gov/pubmed/23571676
http://dx.doi.org/10.1093/dnares/dst009
work_keys_str_mv AT guofengbiao reannotationofproteincodinggenesin10completegenomesofneisseriaceaefamilybycombiningsimilaritybasedandcompositionbasedmethods
AT xionglifeng reannotationofproteincodinggenesin10completegenomesofneisseriaceaefamilybycombiningsimilaritybasedandcompositionbasedmethods
AT tengjadell reannotationofproteincodinggenesin10completegenomesofneisseriaceaefamilybycombiningsimilaritybasedandcompositionbasedmethods
AT yuenkwokyung reannotationofproteincodinggenesin10completegenomesofneisseriaceaefamilybycombiningsimilaritybasedandcompositionbasedmethods
AT laususannakp reannotationofproteincodinggenesin10completegenomesofneisseriaceaefamilybycombiningsimilaritybasedandcompositionbasedmethods
AT woopatrickcy reannotationofproteincodinggenesin10completegenomesofneisseriaceaefamilybycombiningsimilaritybasedandcompositionbasedmethods