Cargando…

Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning

It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their...

Descripción completa

Detalles Bibliográficos
Autores principales: Tayara, Hilal, Chong, Kil To
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6952993/
https://www.ncbi.nlm.nih.gov/pubmed/31847308
http://dx.doi.org/10.3390/cells8121635
_version_ 1783486548492156928
author Tayara, Hilal
Chong, Kil To
author_facet Tayara, Hilal
Chong, Kil To
author_sort Tayara, Hilal
collection PubMed
description It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their functions. In this paper, we introduce a novel computational model based on deep neural networks, called DQDNN, for quantifying the function of non-coding DNA regions. This model combines convolution layers for capturing regularity motifs at multiple scales and recurrent layers for capturing long term dependencies between the captured motifs. In addition, we show that integrating evolutionary information with raw genomic sequences improves the performance of the predictor significantly. The proposed model outperforms the state-of-the-art ones using raw genomics sequences only and also by integrating evolutionary information with raw genomics sequences. More specifically, the proposed model improves 96.9% and 98% of the targets in terms of area under the receiver operating characteristic curve and the precision-recall curve, respectively. In addition, the proposed model improved the prioritization of functional variants of expression quantitative trait loci (eQTLs) compared with the state-of-the-art models.
format Online
Article
Text
id pubmed-6952993
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-69529932020-01-23 Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning Tayara, Hilal Chong, Kil To Cells Article It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their functions. In this paper, we introduce a novel computational model based on deep neural networks, called DQDNN, for quantifying the function of non-coding DNA regions. This model combines convolution layers for capturing regularity motifs at multiple scales and recurrent layers for capturing long term dependencies between the captured motifs. In addition, we show that integrating evolutionary information with raw genomic sequences improves the performance of the predictor significantly. The proposed model outperforms the state-of-the-art ones using raw genomics sequences only and also by integrating evolutionary information with raw genomics sequences. More specifically, the proposed model improves 96.9% and 98% of the targets in terms of area under the receiver operating characteristic curve and the precision-recall curve, respectively. In addition, the proposed model improved the prioritization of functional variants of expression quantitative trait loci (eQTLs) compared with the state-of-the-art models. MDPI 2019-12-14 /pmc/articles/PMC6952993/ /pubmed/31847308 http://dx.doi.org/10.3390/cells8121635 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Tayara, Hilal
Chong, Kil To
Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
title Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
title_full Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
title_fullStr Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
title_full_unstemmed Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
title_short Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
title_sort improving the quantification of dna sequences using evolutionary information based on deep learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6952993/
https://www.ncbi.nlm.nih.gov/pubmed/31847308
http://dx.doi.org/10.3390/cells8121635
work_keys_str_mv AT tayarahilal improvingthequantificationofdnasequencesusingevolutionaryinformationbasedondeeplearning
AT chongkilto improvingthequantificationofdnasequencesusingevolutionaryinformationbasedondeeplearning