Cargando…

Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans

BACKGROUND: The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challe...

Descripción completa

Detalles Bibliográficos
Autores principales: Dargahi, Daryanaz, Baillie, David, Pio, Frederic
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3636199/
https://www.ncbi.nlm.nih.gov/pubmed/23638006
http://dx.doi.org/10.1371/journal.pone.0062204
_version_ 1782267289641418752
author Dargahi, Daryanaz
Baillie, David
Pio, Frederic
author_facet Dargahi, Daryanaz
Baillie, David
Pio, Frederic
author_sort Dargahi, Daryanaz
collection PubMed
description BACKGROUND: The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5–25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n = 46) compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n = 344) or fruit fly D. melanogaster (n = 84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies. METHODOLOGY/PRINCIPAL FINDINGS: This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans. CONCLUSIONS/SIGNIFICANCE: This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.
format Online
Article
Text
id pubmed-3636199
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36361992013-05-01 Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans Dargahi, Daryanaz Baillie, David Pio, Frederic PLoS One Research Article BACKGROUND: The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5–25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n = 46) compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n = 344) or fruit fly D. melanogaster (n = 84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies. METHODOLOGY/PRINCIPAL FINDINGS: This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans. CONCLUSIONS/SIGNIFICANCE: This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome. Public Library of Science 2013-04-25 /pmc/articles/PMC3636199/ /pubmed/23638006 http://dx.doi.org/10.1371/journal.pone.0062204 Text en © 2013 Dargahi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dargahi, Daryanaz
Baillie, David
Pio, Frederic
Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans
title Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans
title_full Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans
title_fullStr Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans
title_full_unstemmed Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans
title_short Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans
title_sort bioinformatics analysis identify novel ob fold protein coding genes in c. elegans
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3636199/
https://www.ncbi.nlm.nih.gov/pubmed/23638006
http://dx.doi.org/10.1371/journal.pone.0062204
work_keys_str_mv AT dargahidaryanaz bioinformaticsanalysisidentifynovelobfoldproteincodinggenesincelegans
AT bailliedavid bioinformaticsanalysisidentifynovelobfoldproteincodinggenesincelegans
AT piofrederic bioinformaticsanalysisidentifynovelobfoldproteincodinggenesincelegans