Cargando…

Gene prioritization and clustering by multi-view text mining

BACKGROUND: Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Shi, Tranchevent, Leon-Charles, De Moor, Bart, Moreau, Yves
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098068/
https://www.ncbi.nlm.nih.gov/pubmed/20074336
http://dx.doi.org/10.1186/1471-2105-11-28
_version_ 1782203909992873984
author Yu, Shi
Tranchevent, Leon-Charles
De Moor, Bart
Moreau, Yves
author_facet Yu, Shi
Tranchevent, Leon-Charles
De Moor, Bart
Moreau, Yves
author_sort Yu, Shi
collection PubMed
description BACKGROUND: Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. RESULTS: We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. CONCLUSIONS: In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.
format Text
id pubmed-3098068
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30980682011-05-20 Gene prioritization and clustering by multi-view text mining Yu, Shi Tranchevent, Leon-Charles De Moor, Bart Moreau, Yves BMC Bioinformatics Methodology Article BACKGROUND: Text mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model. RESULTS: We present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods. CONCLUSIONS: In practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification. BioMed Central 2010-01-14 /pmc/articles/PMC3098068/ /pubmed/20074336 http://dx.doi.org/10.1186/1471-2105-11-28 Text en Copyright ©2010 Yu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Yu, Shi
Tranchevent, Leon-Charles
De Moor, Bart
Moreau, Yves
Gene prioritization and clustering by multi-view text mining
title Gene prioritization and clustering by multi-view text mining
title_full Gene prioritization and clustering by multi-view text mining
title_fullStr Gene prioritization and clustering by multi-view text mining
title_full_unstemmed Gene prioritization and clustering by multi-view text mining
title_short Gene prioritization and clustering by multi-view text mining
title_sort gene prioritization and clustering by multi-view text mining
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098068/
https://www.ncbi.nlm.nih.gov/pubmed/20074336
http://dx.doi.org/10.1186/1471-2105-11-28
work_keys_str_mv AT yushi geneprioritizationandclusteringbymultiviewtextmining
AT trancheventleoncharles geneprioritizationandclusteringbymultiviewtextmining
AT demoorbart geneprioritizationandclusteringbymultiviewtextmining
AT moreauyves geneprioritizationandclusteringbymultiviewtextmining