Cargando…

IntPath--an integrated pathway gene relationship database for model organisms and important pathogens

BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine....

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Hufeng, Jin, Jingjing, Zhang, Haojun, Yi, Bo, Wozniak, Michal, Wong, Limsoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521174/
https://www.ncbi.nlm.nih.gov/pubmed/23282057
http://dx.doi.org/10.1186/1752-0509-6-S2-S2
_version_ 1782252898027044864
author Zhou, Hufeng
Jin, Jingjing
Zhang, Haojun
Yi, Bo
Wozniak, Michal
Wong, Limsoon
author_facet Zhou, Hufeng
Jin, Jingjing
Zhang, Haojun
Yi, Bo
Wozniak, Michal
Wong, Limsoon
author_sort Zhou, Hufeng
collection PubMed
description BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath.
format Online
Article
Text
id pubmed-3521174
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35211742012-12-14 IntPath--an integrated pathway gene relationship database for model organisms and important pathogens Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon BMC Syst Biol Proceedings BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath. BioMed Central 2012-12-12 /pmc/articles/PMC3521174/ /pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2 Text en Copyright ©2012 Zhou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Zhou, Hufeng
Jin, Jingjing
Zhang, Haojun
Yi, Bo
Wozniak, Michal
Wong, Limsoon
IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_full IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_fullStr IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_full_unstemmed IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_short IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_sort intpath--an integrated pathway gene relationship database for model organisms and important pathogens
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521174/
https://www.ncbi.nlm.nih.gov/pubmed/23282057
http://dx.doi.org/10.1186/1752-0509-6-S2-S2
work_keys_str_mv AT zhouhufeng intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens
AT jinjingjing intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens
AT zhanghaojun intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens
AT yibo intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens
AT wozniakmichal intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens
AT wonglimsoon intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens