Cargando…

Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts

In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Roy, Sujoy, Yun, Daqing, Madahian, Behrouz, Berry, Michael W., Deng, Lih-Yuan, Goldowitz, Daniel, Homayouni, Ramin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581332/
https://www.ncbi.nlm.nih.gov/pubmed/28894735
http://dx.doi.org/10.3389/fbioe.2017.00048
_version_ 1783261025503543296
author Roy, Sujoy
Yun, Daqing
Madahian, Behrouz
Berry, Michael W.
Deng, Lih-Yuan
Goldowitz, Daniel
Homayouni, Ramin
author_facet Roy, Sujoy
Yun, Daqing
Madahian, Behrouz
Berry, Michael W.
Deng, Lih-Yuan
Goldowitz, Daniel
Homayouni, Ramin
author_sort Roy, Sujoy
collection PubMed
description In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs.
format Online
Article
Text
id pubmed-5581332
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-55813322017-09-11 Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts Roy, Sujoy Yun, Daqing Madahian, Behrouz Berry, Michael W. Deng, Lih-Yuan Goldowitz, Daniel Homayouni, Ramin Front Bioeng Biotechnol Bioengineering and Biotechnology In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs. Frontiers Media S.A. 2017-08-28 /pmc/articles/PMC5581332/ /pubmed/28894735 http://dx.doi.org/10.3389/fbioe.2017.00048 Text en Copyright © 2017 Roy, Yun, Madahian, Berry, Deng, Goldowitz and Homayouni. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Roy, Sujoy
Yun, Daqing
Madahian, Behrouz
Berry, Michael W.
Deng, Lih-Yuan
Goldowitz, Daniel
Homayouni, Ramin
Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
title Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
title_full Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
title_fullStr Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
title_full_unstemmed Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
title_short Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts
title_sort navigating the functional landscape of transcription factors via non-negative tensor factorization analysis of medline abstracts
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581332/
https://www.ncbi.nlm.nih.gov/pubmed/28894735
http://dx.doi.org/10.3389/fbioe.2017.00048
work_keys_str_mv AT roysujoy navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts
AT yundaqing navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts
AT madahianbehrouz navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts
AT berrymichaelw navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts
AT denglihyuan navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts
AT goldowitzdaniel navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts
AT homayouniramin navigatingthefunctionallandscapeoftranscriptionfactorsvianonnegativetensorfactorizationanalysisofmedlineabstracts