Cargando…
Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts
BACKGROUND: The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073981/ https://www.ncbi.nlm.nih.gov/pubmed/27766940 http://dx.doi.org/10.1186/s12859-016-1223-2 |
_version_ | 1782461671333167104 |
---|---|
author | Roy, Sujoy Curry, Brandon C. Madahian, Behrouz Homayouni, Ramin |
author_facet | Roy, Sujoy Curry, Brandon C. Madahian, Behrouz Homayouni, Ramin |
author_sort | Roy, Sujoy |
collection | PubMed |
description | BACKGROUND: The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs. RESULTS: For approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms. CONCLUSIONS: LSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1223-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5073981 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50739812016-10-27 Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts Roy, Sujoy Curry, Brandon C. Madahian, Behrouz Homayouni, Ramin BMC Bioinformatics Proceedings BACKGROUND: The amount of scientific information about MicroRNAs (miRNAs) is growing exponentially, making it difficult for researchers to interpret experimental results. In this study, we present an automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs. RESULTS: For approximately 900 human miRNAs indexed in miRBase, text documents were created by concatenating titles and abstracts of MEDLINE citations which refer to the miRNAs. The documents were parsed and a weighted term-by-miRNA frequency matrix was created, which was subsequently factorized via singular value decomposition to extract pair-wise cosine values between the term (keyword) and miRNA vectors in reduced rank semantic space. LSI enables derivation of both explicit and implicit associations between entities based on word usage patterns. Using miR2Disease as a gold standard, we found that LSI identified keyword-to-miRNA relationships with high accuracy. In addition, we demonstrate that pair-wise associations between miRNAs can be used to group them into categories which are functionally aligned. Finally, term ranking by querying the LSI space with a group of miRNAs enabled annotation of the clusters with functionally related terms. CONCLUSIONS: LSI modeling of MEDLINE abstracts provides a robust and automated method for miRNA related knowledge discovery. The latest collection of miRNA abstracts and LSI model can be accessed through the web tool miRNA Literature Network (miRLiN) at http://bioinfo.memphis.edu/mirlin. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1223-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-06 /pmc/articles/PMC5073981/ /pubmed/27766940 http://dx.doi.org/10.1186/s12859-016-1223-2 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Roy, Sujoy Curry, Brandon C. Madahian, Behrouz Homayouni, Ramin Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts |
title | Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts |
title_full | Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts |
title_fullStr | Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts |
title_full_unstemmed | Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts |
title_short | Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts |
title_sort | prioritization, clustering and functional annotation of micrornas using latent semantic indexing of medline abstracts |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073981/ https://www.ncbi.nlm.nih.gov/pubmed/27766940 http://dx.doi.org/10.1186/s12859-016-1223-2 |
work_keys_str_mv | AT roysujoy prioritizationclusteringandfunctionalannotationofmicrornasusinglatentsemanticindexingofmedlineabstracts AT currybrandonc prioritizationclusteringandfunctionalannotationofmicrornasusinglatentsemanticindexingofmedlineabstracts AT madahianbehrouz prioritizationclusteringandfunctionalannotationofmicrornasusinglatentsemanticindexingofmedlineabstracts AT homayouniramin prioritizationclusteringandfunctionalannotationofmicrornasusinglatentsemanticindexingofmedlineabstracts |