Cargando…
Towards a robust out-of-the-box neural network model for genomic data
BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accurac...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8994362/ https://www.ncbi.nlm.nih.gov/pubmed/35397517 http://dx.doi.org/10.1186/s12859-022-04660-8 |
_version_ | 1784684092799844352 |
---|---|
author | Zhang, Zhaoyi Cheng, Songyang Solis-Lemus, Claudia |
author_facet | Zhang, Zhaoyi Cheng, Songyang Solis-Lemus, Claudia |
author_sort | Zhang, Zhaoyi |
collection | PubMed |
description | BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. RESULTS: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. CONCLUSIONS: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04660-8. |
format | Online Article Text |
id | pubmed-8994362 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-89943622022-04-10 Towards a robust out-of-the-box neural network model for genomic data Zhang, Zhaoyi Cheng, Songyang Solis-Lemus, Claudia BMC Bioinformatics Research BACKGROUND: The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. RESULTS: Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction accuracy, overfitting and transferability across the datasets under study. CONCLUSIONS: While the perspective of a robust out-of-the-box neural network model is out of reach, we identify certain model characteristics that translate well across datasets and could serve as a baseline model for translational researchers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04660-8. BioMed Central 2022-04-09 /pmc/articles/PMC8994362/ /pubmed/35397517 http://dx.doi.org/10.1186/s12859-022-04660-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhang, Zhaoyi Cheng, Songyang Solis-Lemus, Claudia Towards a robust out-of-the-box neural network model for genomic data |
title | Towards a robust out-of-the-box neural network model for genomic data |
title_full | Towards a robust out-of-the-box neural network model for genomic data |
title_fullStr | Towards a robust out-of-the-box neural network model for genomic data |
title_full_unstemmed | Towards a robust out-of-the-box neural network model for genomic data |
title_short | Towards a robust out-of-the-box neural network model for genomic data |
title_sort | towards a robust out-of-the-box neural network model for genomic data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8994362/ https://www.ncbi.nlm.nih.gov/pubmed/35397517 http://dx.doi.org/10.1186/s12859-022-04660-8 |
work_keys_str_mv | AT zhangzhaoyi towardsarobustoutoftheboxneuralnetworkmodelforgenomicdata AT chengsongyang towardsarobustoutoftheboxneuralnetworkmodelforgenomicdata AT solislemusclaudia towardsarobustoutoftheboxneuralnetworkmodelforgenomicdata |