Cargando…
Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text
BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3797705/ https://www.ncbi.nlm.nih.gov/pubmed/24147091 http://dx.doi.org/10.1371/journal.pone.0077848 |
_version_ | 1782287649891942400 |
---|---|
author | Bin Raies, Arwa Mansour, Hicham Incitti, Roberto Bajic, Vladimir B. |
author_facet | Bin Raies, Arwa Mansour, Hicham Incitti, Roberto Bajic, Vladimir B. |
author_sort | Bin Raies, Arwa |
collection | PubMed |
description | BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download. |
format | Online Article Text |
id | pubmed-3797705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-37977052013-10-21 Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text Bin Raies, Arwa Mansour, Hicham Incitti, Roberto Bajic, Vladimir B. PLoS One Research Article BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download. Public Library of Science 2013-10-16 /pmc/articles/PMC3797705/ /pubmed/24147091 http://dx.doi.org/10.1371/journal.pone.0077848 Text en © 2013 Bin Raies et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bin Raies, Arwa Mansour, Hicham Incitti, Roberto Bajic, Vladimir B. Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text |
title | Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text |
title_full | Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text |
title_fullStr | Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text |
title_full_unstemmed | Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text |
title_short | Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text |
title_sort | combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3797705/ https://www.ncbi.nlm.nih.gov/pubmed/24147091 http://dx.doi.org/10.1371/journal.pone.0077848 |
work_keys_str_mv | AT binraiesarwa combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext AT mansourhicham combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext AT incittiroberto combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext AT bajicvladimirb combiningpositionweightmatricesanddocumenttermmatrixforefficientextractionofassociationsofmethylatedgenesanddiseasesfromfreetext |