Cargando…

Integration of text- and data-mining using ontologies successfully selects disease gene candidates

Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate dise...

Descripción completa

Detalles Bibliográficos
Autores principales: Tiffin, Nicki, Kelso, Janet F., Powell, Alan R., Pan, Hong, Bajic, Vladimir B., Hide, Winston A.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065256/
https://www.ncbi.nlm.nih.gov/pubmed/15767279
http://dx.doi.org/10.1093/nar/gki296
_version_ 1782123357030842368
author Tiffin, Nicki
Kelso, Janet F.
Powell, Alan R.
Pan, Hong
Bajic, Vladimir B.
Hide, Winston A.
author_facet Tiffin, Nicki
Kelso, Janet F.
Powell, Alan R.
Pan, Hong
Bajic, Vladimir B.
Hide, Winston A.
author_sort Tiffin, Nicki
collection PubMed
description Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.
format Text
id pubmed-1065256
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-10652562005-03-15 Integration of text- and data-mining using ontologies successfully selects disease gene candidates Tiffin, Nicki Kelso, Janet F. Powell, Alan R. Pan, Hong Bajic, Vladimir B. Hide, Winston A. Nucleic Acids Res Article Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues. Oxford University Press 2005 2005-03-14 /pmc/articles/PMC1065256/ /pubmed/15767279 http://dx.doi.org/10.1093/nar/gki296 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Tiffin, Nicki
Kelso, Janet F.
Powell, Alan R.
Pan, Hong
Bajic, Vladimir B.
Hide, Winston A.
Integration of text- and data-mining using ontologies successfully selects disease gene candidates
title Integration of text- and data-mining using ontologies successfully selects disease gene candidates
title_full Integration of text- and data-mining using ontologies successfully selects disease gene candidates
title_fullStr Integration of text- and data-mining using ontologies successfully selects disease gene candidates
title_full_unstemmed Integration of text- and data-mining using ontologies successfully selects disease gene candidates
title_short Integration of text- and data-mining using ontologies successfully selects disease gene candidates
title_sort integration of text- and data-mining using ontologies successfully selects disease gene candidates
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065256/
https://www.ncbi.nlm.nih.gov/pubmed/15767279
http://dx.doi.org/10.1093/nar/gki296
work_keys_str_mv AT tiffinnicki integrationoftextanddataminingusingontologiessuccessfullyselectsdiseasegenecandidates
AT kelsojanetf integrationoftextanddataminingusingontologiessuccessfullyselectsdiseasegenecandidates
AT powellalanr integrationoftextanddataminingusingontologiessuccessfullyselectsdiseasegenecandidates
AT panhong integrationoftextanddataminingusingontologiessuccessfullyselectsdiseasegenecandidates
AT bajicvladimirb integrationoftextanddataminingusingontologiessuccessfullyselectsdiseasegenecandidates
AT hidewinstona integrationoftextanddataminingusingontologiessuccessfullyselectsdiseasegenecandidates