Cargando…

AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU

BACKGROUND: The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model’s...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Chih-Hao, Theera-Ampornpunt, Nawanol, Roth, Michael A., Grama, Ananth, Chaterji, Somali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781298/
https://www.ncbi.nlm.nih.gov/pubmed/31590652
http://dx.doi.org/10.1186/s12859-019-3049-1
_version_ 1783457334933061632
author Fang, Chih-Hao
Theera-Ampornpunt, Nawanol
Roth, Michael A.
Grama, Ananth
Chaterji, Somali
author_facet Fang, Chih-Hao
Theera-Ampornpunt, Nawanol
Roth, Michael A.
Grama, Ananth
Chaterji, Somali
author_sort Fang, Chih-Hao
collection PubMed
description BACKGROUND: The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model’s complexity. In our system Aikyatan, we annotate distal epigenomic regulatory sites, e.g., enhancers. Specifically, we develop a binary classifier that classifies genome sequences as distal regulatory regions or not, given their histone modifications’ combinatorial signatures. This problem is challenging because the regulatory regions are distal to the genes, with diverse signatures across classes (e.g., enhancers and insulators) and even within each class (e.g., different enhancer sub-classes). RESULTS: We develop a suite of ML models, under the banner Aikyatan, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection. We demonstrate, with strong empirical evidence, deep learning approaches have a computational advantage. Plus, convolutional neural networks (CNN) provide the best-in-class accuracy, superior to the vanilla variant. With the human embryonic cell line H1, CNN achieves an accuracy of 97.9% and an order of magnitude lower runtime than the kernel SVM. Running on a GPU, the training time is sped up 21x and 30x (over CPU) for DNN and CNN, respectively. Finally, our CNN model enjoys superior prediction performance vis-‘a-vis the competition. Specifically, Aikyatan-CNN achieved 40% higher validation rate versus CSIANN and the same accuracy as RFECS. CONCLUSIONS: Our exhaustive experiments using an array of ML tools validate the need for a model that is not only expressive but can scale with increasing data volumes and diversity. In addition, a subset of these datasets have image-like properties and benefit from spatial pooling of features. Our Aikyatan suite leverages diverse epigenomic datasets that can then be modeled using CNNs with optimized activation and pooling functions. The goal is to capture the salient features of the integrated epigenomic datasets for deciphering the distal (non-coding) regulatory elements, which have been found to be associated with functional variants. Our source code will be made publicly available at: https://bitbucket.org/cellsandmachines/aikyatan. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3049-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6781298
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67812982019-10-17 AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU Fang, Chih-Hao Theera-Ampornpunt, Nawanol Roth, Michael A. Grama, Ananth Chaterji, Somali BMC Bioinformatics Research Article BACKGROUND: The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model’s complexity. In our system Aikyatan, we annotate distal epigenomic regulatory sites, e.g., enhancers. Specifically, we develop a binary classifier that classifies genome sequences as distal regulatory regions or not, given their histone modifications’ combinatorial signatures. This problem is challenging because the regulatory regions are distal to the genes, with diverse signatures across classes (e.g., enhancers and insulators) and even within each class (e.g., different enhancer sub-classes). RESULTS: We develop a suite of ML models, under the banner Aikyatan, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection. We demonstrate, with strong empirical evidence, deep learning approaches have a computational advantage. Plus, convolutional neural networks (CNN) provide the best-in-class accuracy, superior to the vanilla variant. With the human embryonic cell line H1, CNN achieves an accuracy of 97.9% and an order of magnitude lower runtime than the kernel SVM. Running on a GPU, the training time is sped up 21x and 30x (over CPU) for DNN and CNN, respectively. Finally, our CNN model enjoys superior prediction performance vis-‘a-vis the competition. Specifically, Aikyatan-CNN achieved 40% higher validation rate versus CSIANN and the same accuracy as RFECS. CONCLUSIONS: Our exhaustive experiments using an array of ML tools validate the need for a model that is not only expressive but can scale with increasing data volumes and diversity. In addition, a subset of these datasets have image-like properties and benefit from spatial pooling of features. Our Aikyatan suite leverages diverse epigenomic datasets that can then be modeled using CNNs with optimized activation and pooling functions. The goal is to capture the salient features of the integrated epigenomic datasets for deciphering the distal (non-coding) regulatory elements, which have been found to be associated with functional variants. Our source code will be made publicly available at: https://bitbucket.org/cellsandmachines/aikyatan. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3049-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-10-07 /pmc/articles/PMC6781298/ /pubmed/31590652 http://dx.doi.org/10.1186/s12859-019-3049-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Fang, Chih-Hao
Theera-Ampornpunt, Nawanol
Roth, Michael A.
Grama, Ananth
Chaterji, Somali
AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU
title AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU
title_full AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU
title_fullStr AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU
title_full_unstemmed AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU
title_short AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU
title_sort aikyatan: mapping distal regulatory elements using convolutional learning on gpu
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6781298/
https://www.ncbi.nlm.nih.gov/pubmed/31590652
http://dx.doi.org/10.1186/s12859-019-3049-1
work_keys_str_mv AT fangchihhao aikyatanmappingdistalregulatoryelementsusingconvolutionallearningongpu
AT theeraampornpuntnawanol aikyatanmappingdistalregulatoryelementsusingconvolutionallearningongpu
AT rothmichaela aikyatanmappingdistalregulatoryelementsusingconvolutionallearningongpu
AT gramaananth aikyatanmappingdistalregulatoryelementsusingconvolutionallearningongpu
AT chaterjisomali aikyatanmappingdistalregulatoryelementsusingconvolutionallearningongpu