Cargando…

Integration of curated databases to identify genotype-phenotype associations

BACKGROUND: The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular co...

Descripción completa

Detalles Bibliográficos
Autores principales: Goh, Chern-Sing, Gianoulis, Tara A, Liu, Yang, Li, Jianrong, Paccanaro, Alberto, Lussier, Yves A, Gerstein, Mark
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1630430/
https://www.ncbi.nlm.nih.gov/pubmed/17038185
http://dx.doi.org/10.1186/1471-2164-7-257
_version_ 1782130625418887168
author Goh, Chern-Sing
Gianoulis, Tara A
Liu, Yang
Li, Jianrong
Paccanaro, Alberto
Lussier, Yves A
Gerstein, Mark
author_facet Goh, Chern-Sing
Gianoulis, Tara A
Liu, Yang
Li, Jianrong
Paccanaro, Alberto
Lussier, Yves A
Gerstein, Mark
author_sort Goh, Chern-Sing
collection PubMed
description BACKGROUND: The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular composition (i.e. genotype) and phenotype for microbes is not obvious. While there have been several studies that address this challenge, none have yet proposed a large-scale method integrating curated biological information. Here we utilize a systematic approach to discover genotype-phenotype associations that combines phenotypic information from a biomedical informatics database, GIDEON, with the molecular information contained in National Center for Biotechnology Information's Clusters of Orthologous Groups database (NCBI COGs). RESULTS: Integrating the information in the two databases, we are able to correlate the presence or absence of a given protein in a microbe with its phenotype as measured by certain morphological characteristics or survival in a particular growth media. With a 0.8 correlation score threshold, 66% of the associations found were confirmed by the literature and at a 0.9 correlation threshold, 86% were positively verified. CONCLUSION: Our results suggest possible phenotypic manifestations for proteins biochemically associated with sugar metabolism and electron transport. Moreover, we believe our approach can be extended to linking pathogenic phenotypes with functionally related proteins.
format Text
id pubmed-1630430
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16304302006-11-06 Integration of curated databases to identify genotype-phenotype associations Goh, Chern-Sing Gianoulis, Tara A Liu, Yang Li, Jianrong Paccanaro, Alberto Lussier, Yves A Gerstein, Mark BMC Genomics Methodology Article BACKGROUND: The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular composition (i.e. genotype) and phenotype for microbes is not obvious. While there have been several studies that address this challenge, none have yet proposed a large-scale method integrating curated biological information. Here we utilize a systematic approach to discover genotype-phenotype associations that combines phenotypic information from a biomedical informatics database, GIDEON, with the molecular information contained in National Center for Biotechnology Information's Clusters of Orthologous Groups database (NCBI COGs). RESULTS: Integrating the information in the two databases, we are able to correlate the presence or absence of a given protein in a microbe with its phenotype as measured by certain morphological characteristics or survival in a particular growth media. With a 0.8 correlation score threshold, 66% of the associations found were confirmed by the literature and at a 0.9 correlation threshold, 86% were positively verified. CONCLUSION: Our results suggest possible phenotypic manifestations for proteins biochemically associated with sugar metabolism and electron transport. Moreover, we believe our approach can be extended to linking pathogenic phenotypes with functionally related proteins. BioMed Central 2006-10-12 /pmc/articles/PMC1630430/ /pubmed/17038185 http://dx.doi.org/10.1186/1471-2164-7-257 Text en Copyright © 2006 Goh et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Goh, Chern-Sing
Gianoulis, Tara A
Liu, Yang
Li, Jianrong
Paccanaro, Alberto
Lussier, Yves A
Gerstein, Mark
Integration of curated databases to identify genotype-phenotype associations
title Integration of curated databases to identify genotype-phenotype associations
title_full Integration of curated databases to identify genotype-phenotype associations
title_fullStr Integration of curated databases to identify genotype-phenotype associations
title_full_unstemmed Integration of curated databases to identify genotype-phenotype associations
title_short Integration of curated databases to identify genotype-phenotype associations
title_sort integration of curated databases to identify genotype-phenotype associations
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1630430/
https://www.ncbi.nlm.nih.gov/pubmed/17038185
http://dx.doi.org/10.1186/1471-2164-7-257
work_keys_str_mv AT gohchernsing integrationofcurateddatabasestoidentifygenotypephenotypeassociations
AT gianoulistaraa integrationofcurateddatabasestoidentifygenotypephenotypeassociations
AT liuyang integrationofcurateddatabasestoidentifygenotypephenotypeassociations
AT lijianrong integrationofcurateddatabasestoidentifygenotypephenotypeassociations
AT paccanaroalberto integrationofcurateddatabasestoidentifygenotypephenotypeassociations
AT lussieryvesa integrationofcurateddatabasestoidentifygenotypephenotypeassociations
AT gersteinmark integrationofcurateddatabasestoidentifygenotypephenotypeassociations