Cargando…
Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mo...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581332/ https://www.ncbi.nlm.nih.gov/pubmed/28894735 http://dx.doi.org/10.3389/fbioe.2017.00048 |
_version_ | 1783261025503543296 |
---|---|
author | Roy, Sujoy Yun, Daqing Madahian, Behrouz Berry, Michael W. Deng, Lih-Yuan Goldowitz, Daniel Homayouni, Ramin |
author_facet | Roy, Sujoy Yun, Daqing Madahian, Behrouz Berry, Michael W. Deng, Lih-Yuan Goldowitz, Daniel Homayouni, Ramin |
author_sort | Roy, Sujoy |
collection | PubMed |
description | In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs. |
format | Online Article Text |
id | pubmed-5581332 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-55813322017-09-11 Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts Roy, Sujoy Yun, Daqing Madahian, Behrouz Berry, Michael W. Deng, Lih-Yuan Goldowitz, Daniel Homayouni, Ramin Front Bioeng Biotechnol Bioengineering and Biotechnology In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs. Frontiers Media S.A. 2017-08-28 /pmc/articles/PMC5581332/ /pubmed/28894735 http://dx.doi.org/10.3389/fbioe.2017.00048 Text en Copyright © 2017 Roy, Yun, Madahian, Berry, Deng, Goldowitz and Homayouni. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioengineering and Biotechnology Roy, Sujoy Yun, Daqing Madahian, Behrouz Berry, Michael W. Deng, Lih-Yuan Goldowitz, Daniel Homayouni, Ramin Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts |
title | Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts |
title_full | Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts |
title_fullStr | Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts |
title_full_unstemmed | Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts |
title_short | Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts |
title_sort | navigating the functional landscape of transcription factors via non-negative tensor factorization analysis of medline abstracts |
topic | Bioengineering and Biotechnology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581332/ https://www.ncbi.nlm.nih.gov/pubmed/28894735 http://dx.doi.org/10.3389/fbioe.2017.00048 |
work_keys_str_mv | AT roysujoy navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts AT yundaqing navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts AT madahianbehrouz navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts AT berrymichaelw navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts AT denglihyuan navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts AT goldowitzdaniel navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts AT homayouniramin navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts |