Cargando…

DASSI: differential architecture search for splice identification from DNA sequences

BACKGROUND: The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learnin...

Descripción completa

Detalles Bibliográficos
Autores principales: Moosa, Shabir, Amira, Prof. Abbes, Boughorbel, Dr. Sabri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885202/
https://www.ncbi.nlm.nih.gov/pubmed/33588916
http://dx.doi.org/10.1186/s13040-021-00237-y
_version_ 1783651557675368448
author Moosa, Shabir
Amira, Prof. Abbes
Boughorbel, Dr. Sabri
author_facet Moosa, Shabir
Amira, Prof. Abbes
Boughorbel, Dr. Sabri
author_sort Moosa, Shabir
collection PubMed
description BACKGROUND: The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design. This has been fueled through the development of new DL architectures. Yet genomics possesses unique challenges that requires customization and development of new DL models. METHODS: We proposed a new model, DASSI, by adapting a differential architecture search method and applying it to the Splice Site (SS) recognition task on DNA sequences to discover new high-performance convolutional architectures in an automated manner. We evaluated the discovered model against state-of-the-art tools to classify true and false SS in Homo sapiens (Human), Arabidopsis thaliana (Plant), Caenorhabditis elegans (Worm) and Drosophila melanogaster (Fly). RESULTS: Our experimental evaluation demonstrated that the discovered architecture outperformed baseline models and fixed architectures and showed competitive results against state-of-the-art models used in classification of splice sites. The proposed model - DASSI has a compact architecture and showed very good results on a transfer learning task. The benchmarking experiments of execution time and precision on architecture search and evaluation process showed better performance on recently available GPUs making it feasible to adopt architecture search based methods on large datasets. CONCLUSIONS: We proposed the use of differential architecture search method (DASSI) to perform SS classification on raw DNA sequences, and discovered new neural network models with low number of tunable parameters and competitive performance compared with manually engineered architectures. We have extensively benchmarked DASSI model with other state-of-the-art models and assessed its computational efficiency. The results have shown a high potential of using automated architecture search mechanism for solving various problems in the field of genomics.
format Online
Article
Text
id pubmed-7885202
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78852022021-02-17 DASSI: differential architecture search for splice identification from DNA sequences Moosa, Shabir Amira, Prof. Abbes Boughorbel, Dr. Sabri BioData Min Research BACKGROUND: The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design. This has been fueled through the development of new DL architectures. Yet genomics possesses unique challenges that requires customization and development of new DL models. METHODS: We proposed a new model, DASSI, by adapting a differential architecture search method and applying it to the Splice Site (SS) recognition task on DNA sequences to discover new high-performance convolutional architectures in an automated manner. We evaluated the discovered model against state-of-the-art tools to classify true and false SS in Homo sapiens (Human), Arabidopsis thaliana (Plant), Caenorhabditis elegans (Worm) and Drosophila melanogaster (Fly). RESULTS: Our experimental evaluation demonstrated that the discovered architecture outperformed baseline models and fixed architectures and showed competitive results against state-of-the-art models used in classification of splice sites. The proposed model - DASSI has a compact architecture and showed very good results on a transfer learning task. The benchmarking experiments of execution time and precision on architecture search and evaluation process showed better performance on recently available GPUs making it feasible to adopt architecture search based methods on large datasets. CONCLUSIONS: We proposed the use of differential architecture search method (DASSI) to perform SS classification on raw DNA sequences, and discovered new neural network models with low number of tunable parameters and competitive performance compared with manually engineered architectures. We have extensively benchmarked DASSI model with other state-of-the-art models and assessed its computational efficiency. The results have shown a high potential of using automated architecture search mechanism for solving various problems in the field of genomics. BioMed Central 2021-02-15 /pmc/articles/PMC7885202/ /pubmed/33588916 http://dx.doi.org/10.1186/s13040-021-00237-y Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Moosa, Shabir
Amira, Prof. Abbes
Boughorbel, Dr. Sabri
DASSI: differential architecture search for splice identification from DNA sequences
title DASSI: differential architecture search for splice identification from DNA sequences
title_full DASSI: differential architecture search for splice identification from DNA sequences
title_fullStr DASSI: differential architecture search for splice identification from DNA sequences
title_full_unstemmed DASSI: differential architecture search for splice identification from DNA sequences
title_short DASSI: differential architecture search for splice identification from DNA sequences
title_sort dassi: differential architecture search for splice identification from dna sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7885202/
https://www.ncbi.nlm.nih.gov/pubmed/33588916
http://dx.doi.org/10.1186/s13040-021-00237-y
work_keys_str_mv AT moosashabir dassidifferentialarchitecturesearchforspliceidentificationfromdnasequences
AT amiraprofabbes dassidifferentialarchitecturesearchforspliceidentificationfromdnasequences
AT boughorbeldrsabri dassidifferentialarchitecturesearchforspliceidentificationfromdnasequences