Cargando…
Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6952993/ https://www.ncbi.nlm.nih.gov/pubmed/31847308 http://dx.doi.org/10.3390/cells8121635 |
_version_ | 1783486548492156928 |
---|---|
author | Tayara, Hilal Chong, Kil To |
author_facet | Tayara, Hilal Chong, Kil To |
author_sort | Tayara, Hilal |
collection | PubMed |
description | It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their functions. In this paper, we introduce a novel computational model based on deep neural networks, called DQDNN, for quantifying the function of non-coding DNA regions. This model combines convolution layers for capturing regularity motifs at multiple scales and recurrent layers for capturing long term dependencies between the captured motifs. In addition, we show that integrating evolutionary information with raw genomic sequences improves the performance of the predictor significantly. The proposed model outperforms the state-of-the-art ones using raw genomics sequences only and also by integrating evolutionary information with raw genomics sequences. More specifically, the proposed model improves 96.9% and 98% of the targets in terms of area under the receiver operating characteristic curve and the precision-recall curve, respectively. In addition, the proposed model improved the prioritization of functional variants of expression quantitative trait loci (eQTLs) compared with the state-of-the-art models. |
format | Online Article Text |
id | pubmed-6952993 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-69529932020-01-23 Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning Tayara, Hilal Chong, Kil To Cells Article It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their functions. In this paper, we introduce a novel computational model based on deep neural networks, called DQDNN, for quantifying the function of non-coding DNA regions. This model combines convolution layers for capturing regularity motifs at multiple scales and recurrent layers for capturing long term dependencies between the captured motifs. In addition, we show that integrating evolutionary information with raw genomic sequences improves the performance of the predictor significantly. The proposed model outperforms the state-of-the-art ones using raw genomics sequences only and also by integrating evolutionary information with raw genomics sequences. More specifically, the proposed model improves 96.9% and 98% of the targets in terms of area under the receiver operating characteristic curve and the precision-recall curve, respectively. In addition, the proposed model improved the prioritization of functional variants of expression quantitative trait loci (eQTLs) compared with the state-of-the-art models. MDPI 2019-12-14 /pmc/articles/PMC6952993/ /pubmed/31847308 http://dx.doi.org/10.3390/cells8121635 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Tayara, Hilal Chong, Kil To Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning |
title | Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning |
title_full | Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning |
title_fullStr | Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning |
title_full_unstemmed | Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning |
title_short | Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning |
title_sort | improving the quantification of dna sequences using evolutionary information based on deep learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6952993/ https://www.ncbi.nlm.nih.gov/pubmed/31847308 http://dx.doi.org/10.3390/cells8121635 |
work_keys_str_mv | AT tayarahilal improvingthequantificationofdnasequencesusingevolutionaryinformationbasedondeeplearning AT chongkilto improvingthequantificationofdnasequencesusingevolutionaryinformationbasedondeeplearning |