Cargando…

IntPath--an integrated pathway gene relationship database for model organisms and important pathogens

BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine....

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Hufeng, Jin, Jingjing, Zhang, Haojun, Yi, Bo, Wozniak, Michal, Wong, Limsoon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521174/ https://www.ncbi.nlm.nih.gov/pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2

_version_	1782252898027044864
author	Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon
author_facet	Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon
author_sort	Zhou, Hufeng
collection	PubMed
description	BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath.
format	Online Article Text
id	pubmed-3521174
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35211742012-12-14 IntPath--an integrated pathway gene relationship database for model organisms and important pathogens Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon BMC Syst Biol Proceedings BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath. BioMed Central 2012-12-12 /pmc/articles/PMC3521174/ /pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2 Text en Copyright ©2012 Zhou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title	IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_full	IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_fullStr	IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_full_unstemmed	IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_short	IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
title_sort	intpath--an integrated pathway gene relationship database for model organisms and important pathogens
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521174/ https://www.ncbi.nlm.nih.gov/pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2
work_keys_str_mv	AT zhouhufeng intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT jinjingjing intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT zhanghaojun intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT yibo intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT wozniakmichal intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT wonglimsoon intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens

IntPath--an integrated pathway gene relationship database for model organisms and important pathogens

Ejemplares similares