Cargando…
CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction
Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervise...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6841655/ https://www.ncbi.nlm.nih.gov/pubmed/30588558 http://dx.doi.org/10.1007/s12539-018-0313-4 |
_version_ | 1783467936071024640 |
---|---|
author | Al-Ajlan, Amani El Allali, Achraf |
author_facet | Al-Ajlan, Amani El Allali, Achraf |
author_sort | Al-Ajlan, Amani |
collection | PubMed |
description | Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In our study, we introduce a convolutional neural network for metagenomics gene prediction (CNN-MGP) program that predicts genes in metagenomics fragments directly from raw DNA sequences, without the need for manual feature extraction and feature selection stages. CNN-MGP is able to learn the characteristics of coding and non-coding regions and distinguish coding and non-coding open reading frames (ORFs). We train 10 CNN models on 10 mutually exclusive datasets based on pre-defined GC content ranges. We extract ORFs from each fragment; then, the ORFs are encoded numerically and inputted into an appropriate CNN model based on the fragment-GC content. The output from the CNN is the probability that an ORF will encode a gene. Finally, a greedy algorithm is used to select the final gene list. Overall, CNN-MGP is effective and achieves a 91% accuracy on testing dataset. CNN-MGP shows the ability of deep learning to predict genes in metagenomics fragments, and it achieves an accuracy higher than or comparable to state-of-the-art gene-prediction programs that use pre-defined features. |
format | Online Article Text |
id | pubmed-6841655 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-68416552019-11-20 CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction Al-Ajlan, Amani El Allali, Achraf Interdiscip Sci Original Research Article Accurate gene prediction in metagenomics fragments is a computationally challenging task due to the short-read length, incomplete, and fragmented nature of the data. Most gene-prediction programs are based on extracting a large number of features and then applying statistical approaches or supervised classification approaches to predict genes. In our study, we introduce a convolutional neural network for metagenomics gene prediction (CNN-MGP) program that predicts genes in metagenomics fragments directly from raw DNA sequences, without the need for manual feature extraction and feature selection stages. CNN-MGP is able to learn the characteristics of coding and non-coding regions and distinguish coding and non-coding open reading frames (ORFs). We train 10 CNN models on 10 mutually exclusive datasets based on pre-defined GC content ranges. We extract ORFs from each fragment; then, the ORFs are encoded numerically and inputted into an appropriate CNN model based on the fragment-GC content. The output from the CNN is the probability that an ORF will encode a gene. Finally, a greedy algorithm is used to select the final gene list. Overall, CNN-MGP is effective and achieves a 91% accuracy on testing dataset. CNN-MGP shows the ability of deep learning to predict genes in metagenomics fragments, and it achieves an accuracy higher than or comparable to state-of-the-art gene-prediction programs that use pre-defined features. Springer Berlin Heidelberg 2018-12-27 2019 /pmc/articles/PMC6841655/ /pubmed/30588558 http://dx.doi.org/10.1007/s12539-018-0313-4 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Original Research Article Al-Ajlan, Amani El Allali, Achraf CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction |
title | CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction |
title_full | CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction |
title_fullStr | CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction |
title_full_unstemmed | CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction |
title_short | CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction |
title_sort | cnn-mgp: convolutional neural networks for metagenomics gene prediction |
topic | Original Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6841655/ https://www.ncbi.nlm.nih.gov/pubmed/30588558 http://dx.doi.org/10.1007/s12539-018-0313-4 |
work_keys_str_mv | AT alajlanamani cnnmgpconvolutionalneuralnetworksformetagenomicsgeneprediction AT elallaliachraf cnnmgpconvolutionalneuralnetworksformetagenomicsgeneprediction |