Cargando…

A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification

Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Le, Nguyen Quoc Khanh, Do, Duyen Thi, Hung, Truong Nguyen Khanh, Lam, Luu Ho Thanh, Huynh, Tuan-Tu, Nguyen, Ngan Thi Kim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730808/
https://www.ncbi.nlm.nih.gov/pubmed/33260643
http://dx.doi.org/10.3390/ijms21239070
_version_ 1783621769449439232
author Le, Nguyen Quoc Khanh
Do, Duyen Thi
Hung, Truong Nguyen Khanh
Lam, Luu Ho Thanh
Huynh, Tuan-Tu
Nguyen, Ngan Thi Kim
author_facet Le, Nguyen Quoc Khanh
Do, Duyen Thi
Hung, Truong Nguyen Khanh
Lam, Luu Ho Thanh
Huynh, Tuan-Tu
Nguyen, Ngan Thi Kim
author_sort Le, Nguyen Quoc Khanh
collection PubMed
description Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
format Online
Article
Text
id pubmed-7730808
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77308082020-12-12 A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification Le, Nguyen Quoc Khanh Do, Duyen Thi Hung, Truong Nguyen Khanh Lam, Luu Ho Thanh Huynh, Tuan-Tu Nguyen, Ngan Thi Kim Int J Mol Sci Article Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general. MDPI 2020-11-28 /pmc/articles/PMC7730808/ /pubmed/33260643 http://dx.doi.org/10.3390/ijms21239070 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Le, Nguyen Quoc Khanh
Do, Duyen Thi
Hung, Truong Nguyen Khanh
Lam, Luu Ho Thanh
Huynh, Tuan-Tu
Nguyen, Ngan Thi Kim
A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
title A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
title_full A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
title_fullStr A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
title_full_unstemmed A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
title_short A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
title_sort computational framework based on ensemble deep neural networks for essential genes identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7730808/
https://www.ncbi.nlm.nih.gov/pubmed/33260643
http://dx.doi.org/10.3390/ijms21239070
work_keys_str_mv AT lenguyenquockhanh acomputationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT doduyenthi acomputationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT hungtruongnguyenkhanh acomputationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT lamluuhothanh acomputationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT huynhtuantu acomputationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT nguyennganthikim acomputationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT lenguyenquockhanh computationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT doduyenthi computationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT hungtruongnguyenkhanh computationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT lamluuhothanh computationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT huynhtuantu computationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification
AT nguyennganthikim computationalframeworkbasedonensembledeepneuralnetworksforessentialgenesidentification