Cargando…

Towards a robust out-of-the-box neural network model for genomic data

BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accurac...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhaoyi, Cheng, Songyang, Solis-Lemus, Claudia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8994362/
https://www.ncbi.nlm.nih.gov/pubmed/35397517
http://dx.doi.org/10.1186/s12859-022-04660-8
_version_ 1784684092799844352
author Zhang, Zhaoyi
Cheng, Songyang
Solis-Lemus, Claudia
author_facet Zhang, Zhaoyi
Cheng, Songyang
Solis-Lemus, Claudia
author_sort Zhang, Zhaoyi
collection PubMed
description BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. RESULTS: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. CONCLUSIONS: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04660-8.
format Online
Article
Text
id pubmed-8994362
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89943622022-04-10 Towards a robust out-of-the-box neural network model for genomic data Zhang, Zhaoyi Cheng, Songyang Solis-Lemus, Claudia BMC Bioinformatics Research BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. RESULTS: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. CONCLUSIONS: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04660-8. BioMed Central 2022-04-09 /pmc/articles/PMC8994362/ /pubmed/35397517 http://dx.doi.org/10.1186/s12859-022-04660-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhang, Zhaoyi
Cheng, Songyang
Solis-Lemus, Claudia
Towards a robust out-of-the-box neural network model for genomic data
title Towards a robust out-of-the-box neural network model for genomic data
title_full Towards a robust out-of-the-box neural network model for genomic data
title_fullStr Towards a robust out-of-the-box neural network model for genomic data
title_full_unstemmed Towards a robust out-of-the-box neural network model for genomic data
title_short Towards a robust out-of-the-box neural network model for genomic data
title_sort towards a robust out-of-the-box neural network model for genomic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8994362/
https://www.ncbi.nlm.nih.gov/pubmed/35397517
http://dx.doi.org/10.1186/s12859-022-04660-8
work_keys_str_mv AT zhangzhaoyi towardsarobustoutoftheboxneuralnetworkmodelforgenomicdata
AT chengsongyang towardsarobustoutoftheboxneuralnetworkmodelforgenomicdata
AT solislemusclaudia towardsarobustoutoftheboxneuralnetworkmodelforgenomicdata