Cargando…

ENNGene: an Easy Neural Network model building tool for Genomics

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and R...

Descripción completa

Detalles Bibliográficos
Autores principales: Chalupová, Eliška, Vaculík, Ondřej, Poláček, Jakub, Jozefov, Filip, Majtner, Tomáš, Alexiou, Panagiotis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8973509/
https://www.ncbi.nlm.nih.gov/pubmed/35361122
http://dx.doi.org/10.1186/s12864-022-08414-x
_version_ 1784680054369812480
author Chalupová, Eliška
Vaculík, Ondřej
Poláček, Jakub
Jozefov, Filip
Majtner, Tomáš
Alexiou, Panagiotis
author_facet Chalupová, Eliška
Vaculík, Ondřej
Poláček, Jakub
Jozefov, Filip
Majtner, Tomáš
Alexiou, Panagiotis
author_sort Chalupová, Eliška
collection PubMed
description BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08414-x.
format Online
Article
Text
id pubmed-8973509
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89735092022-04-02 ENNGene: an Easy Neural Network model building tool for Genomics Chalupová, Eliška Vaculík, Ondřej Poláček, Jakub Jozefov, Filip Majtner, Tomáš Alexiou, Panagiotis BMC Genomics Software BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08414-x. BioMed Central 2022-03-31 /pmc/articles/PMC8973509/ /pubmed/35361122 http://dx.doi.org/10.1186/s12864-022-08414-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Chalupová, Eliška
Vaculík, Ondřej
Poláček, Jakub
Jozefov, Filip
Majtner, Tomáš
Alexiou, Panagiotis
ENNGene: an Easy Neural Network model building tool for Genomics
title ENNGene: an Easy Neural Network model building tool for Genomics
title_full ENNGene: an Easy Neural Network model building tool for Genomics
title_fullStr ENNGene: an Easy Neural Network model building tool for Genomics
title_full_unstemmed ENNGene: an Easy Neural Network model building tool for Genomics
title_short ENNGene: an Easy Neural Network model building tool for Genomics
title_sort enngene: an easy neural network model building tool for genomics
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8973509/
https://www.ncbi.nlm.nih.gov/pubmed/35361122
http://dx.doi.org/10.1186/s12864-022-08414-x
work_keys_str_mv AT chalupovaeliska enngeneaneasyneuralnetworkmodelbuildingtoolforgenomics
AT vaculikondrej enngeneaneasyneuralnetworkmodelbuildingtoolforgenomics
AT polacekjakub enngeneaneasyneuralnetworkmodelbuildingtoolforgenomics
AT jozefovfilip enngeneaneasyneuralnetworkmodelbuildingtoolforgenomics
AT majtnertomas enngeneaneasyneuralnetworkmodelbuildingtoolforgenomics
AT alexioupanagiotis enngeneaneasyneuralnetworkmodelbuildingtoolforgenomics