Cargando…

Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of...

Descripción completa

Detalles Bibliográficos
Autores principales: Bin Raies, Arwa, Mansour, Hicham, Incitti, Roberto, Bajic, Vladimir B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3797705/
https://www.ncbi.nlm.nih.gov/pubmed/24147091
http://dx.doi.org/10.1371/journal.pone.0077848
_version_ 1782287649891942400
author Bin Raies, Arwa
Mansour, Hicham
Incitti, Roberto
Bajic, Vladimir B.
author_facet Bin Raies, Arwa
Mansour, Hicham
Incitti, Roberto
Bajic, Vladimir B.
author_sort Bin Raies, Arwa
collection PubMed
description BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download.
format Online
Article
Text
id pubmed-3797705
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37977052013-10-21 Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text Bin Raies, Arwa Mansour, Hicham Incitti, Roberto Bajic, Vladimir B. PLoS One Research Article BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download. Public Library of Science 2013-10-16 /pmc/articles/PMC3797705/ /pubmed/24147091 http://dx.doi.org/10.1371/journal.pone.0077848 Text en © 2013 Bin Raies et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bin Raies, Arwa
Mansour, Hicham
Incitti, Roberto
Bajic, Vladimir B.
Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
title Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
title_full Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
title_fullStr Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
title_full_unstemmed Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
title_short Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
title_sort combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3797705/
https://www.ncbi.nlm.nih.gov/pubmed/24147091
http://dx.doi.org/10.1371/journal.pone.0077848
work_keys_str_mv AT binraiesarwa combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext
AT mansourhicham combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext
AT incittiroberto combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext
AT bajicvladimirb combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext