Cargando…

Using machine learning algorithms to identify genes essential for cell survival

BACKGROUND: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven b...

Descripción completa

Detalles Bibliográficos
Autores principales: Philips, Santosh, Wu, Heng-Yi, Li, Lang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629548/
https://www.ncbi.nlm.nih.gov/pubmed/28984184
http://dx.doi.org/10.1186/s12859-017-1799-1
_version_ 1783269064480653312
author Philips, Santosh
Wu, Heng-Yi
Li, Lang
author_facet Philips, Santosh
Wu, Heng-Yi
Li, Lang
author_sort Philips, Santosh
collection PubMed
description BACKGROUND: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. RESULTS: A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. CONCLUSIONS: Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1799-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5629548
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56295482017-10-13 Using machine learning algorithms to identify genes essential for cell survival Philips, Santosh Wu, Heng-Yi Li, Lang BMC Bioinformatics Research BACKGROUND: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. RESULTS: A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. CONCLUSIONS: Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1799-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-03 /pmc/articles/PMC5629548/ /pubmed/28984184 http://dx.doi.org/10.1186/s12859-017-1799-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Philips, Santosh
Wu, Heng-Yi
Li, Lang
Using machine learning algorithms to identify genes essential for cell survival
title Using machine learning algorithms to identify genes essential for cell survival
title_full Using machine learning algorithms to identify genes essential for cell survival
title_fullStr Using machine learning algorithms to identify genes essential for cell survival
title_full_unstemmed Using machine learning algorithms to identify genes essential for cell survival
title_short Using machine learning algorithms to identify genes essential for cell survival
title_sort using machine learning algorithms to identify genes essential for cell survival
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629548/
https://www.ncbi.nlm.nih.gov/pubmed/28984184
http://dx.doi.org/10.1186/s12859-017-1799-1
work_keys_str_mv AT philipssantosh usingmachinelearningalgorithmstoidentifygenesessentialforcellsurvival
AT wuhengyi usingmachinelearningalgorithmstoidentifygenesessentialforcellsurvival
AT lilang usingmachinelearningalgorithmstoidentifygenesessentialforcellsurvival