Cargando…
IntPath--an integrated pathway gene relationship database for model organisms and important pathogens
BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521174/ https://www.ncbi.nlm.nih.gov/pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2 |
_version_ | 1782252898027044864 |
---|---|
author | Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon |
author_facet | Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon |
author_sort | Zhou, Hufeng |
collection | PubMed |
description | BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath. |
format | Online Article Text |
id | pubmed-3521174 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35211742012-12-14 IntPath--an integrated pathway gene relationship database for model organisms and important pathogens Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon BMC Syst Biol Proceedings BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases. RESULTS: In this paper, we overcome these issues through extraction, normalization and integration of pathway data from several major public databases (KEGG, WikiPathways, BioCyc, etc). We build a database that not only hosts our integrated pathway gene relationship data for public access but also maintains the necessary updates in the long run. This public repository is named IntPath (Integrated Pathway gene relationship database for model organisms and important pathogens). Four organisms--S. cerevisiae, M. tuberculosis H37Rv, H. Sapiens and M. musculus--are included in this version (V2.0) of IntPath. IntPath uses the "full unification" approach to ensure no deletion and no introduced noise in this process. Therefore, IntPath contains much richer pathway-gene and pathway-gene pair relationships and much larger number of non-redundant genes and gene pairs than any of the single-source databases. The gene relationships of each gene (measured by average node degree) per pathway are significantly richer. The gene relationships in each pathway (measured by average number of gene pairs per pathway) are also considerably richer in the integrated pathways. Moderate manual curation are involved to get rid of errors and noises from source data (e.g., the gene ID errors in WikiPathways and relationship errors in KEGG). We turn complicated and incompatible xml data formats and inconsistent gene and gene relationship representations from different source databases into normalized and unified pathway-gene and pathway-gene pair relationships neatly recorded in simple tab-delimited text format and MySQL tables, which facilitates convenient automatic computation and large-scale referencing in many related studies. IntPath data can be downloaded in text format or MySQL dump. IntPath data can also be retrieved and analyzed conveniently through web service by local programs or through web interface by mouse clicks. Several useful analysis tools are also provided in IntPath. CONCLUSIONS: We have overcome in IntPath the issues of compatibility, consistency, and comprehensiveness that often hamper effective use of pathway databases. We have included four organisms in the current release of IntPath. Our methodology and programs described in this work can be easily applied to other organisms; and we will include more model organisms and important pathogens in future releases of IntPath. IntPath maintains regular updates and is freely available at http://compbio.ddns.comp.nus.edu.sg:8080/IntPath. BioMed Central 2012-12-12 /pmc/articles/PMC3521174/ /pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2 Text en Copyright ©2012 Zhou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Zhou, Hufeng Jin, Jingjing Zhang, Haojun Yi, Bo Wozniak, Michal Wong, Limsoon IntPath--an integrated pathway gene relationship database for model organisms and important pathogens |
title | IntPath--an integrated pathway gene relationship database for model organisms and important pathogens |
title_full | IntPath--an integrated pathway gene relationship database for model organisms and important pathogens |
title_fullStr | IntPath--an integrated pathway gene relationship database for model organisms and important pathogens |
title_full_unstemmed | IntPath--an integrated pathway gene relationship database for model organisms and important pathogens |
title_short | IntPath--an integrated pathway gene relationship database for model organisms and important pathogens |
title_sort | intpath--an integrated pathway gene relationship database for model organisms and important pathogens |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521174/ https://www.ncbi.nlm.nih.gov/pubmed/23282057 http://dx.doi.org/10.1186/1752-0509-6-S2-S2 |
work_keys_str_mv | AT zhouhufeng intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT jinjingjing intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT zhanghaojun intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT yibo intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT wozniakmichal intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens AT wonglimsoon intpathanintegratedpathwaygenerelationshipdatabaseformodelorganismsandimportantpathogens |