Cargando…

The effect of non-linear signal in classification problems using gene expression

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second,...

Descripción completa

Detalles Bibliográficos
Autores principales: Heil, Benjamin J., Crawford, Jake, Greene, Casey S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079219/
https://www.ncbi.nlm.nih.gov/pubmed/36972227
http://dx.doi.org/10.1371/journal.pcbi.1010984
_version_ 1785020685900316672
author Heil, Benjamin J.
Crawford, Jake
Greene, Casey S.
author_facet Heil, Benjamin J.
Crawford, Jake
Greene, Casey S.
author_sort Heil, Benjamin J.
collection PubMed
description Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.
format Online
Article
Text
id pubmed-10079219
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-100792192023-04-07 The effect of non-linear signal in classification problems using gene expression Heil, Benjamin J. Crawford, Jake Greene, Casey S. PLoS Comput Biol Research Article Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be. Public Library of Science 2023-03-27 /pmc/articles/PMC10079219/ /pubmed/36972227 http://dx.doi.org/10.1371/journal.pcbi.1010984 Text en © 2023 Heil et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Heil, Benjamin J.
Crawford, Jake
Greene, Casey S.
The effect of non-linear signal in classification problems using gene expression
title The effect of non-linear signal in classification problems using gene expression
title_full The effect of non-linear signal in classification problems using gene expression
title_fullStr The effect of non-linear signal in classification problems using gene expression
title_full_unstemmed The effect of non-linear signal in classification problems using gene expression
title_short The effect of non-linear signal in classification problems using gene expression
title_sort effect of non-linear signal in classification problems using gene expression
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079219/
https://www.ncbi.nlm.nih.gov/pubmed/36972227
http://dx.doi.org/10.1371/journal.pcbi.1010984
work_keys_str_mv AT heilbenjaminj theeffectofnonlinearsignalinclassificationproblemsusinggeneexpression
AT crawfordjake theeffectofnonlinearsignalinclassificationproblemsusinggeneexpression
AT greenecaseys theeffectofnonlinearsignalinclassificationproblemsusinggeneexpression
AT heilbenjaminj effectofnonlinearsignalinclassificationproblemsusinggeneexpression
AT crawfordjake effectofnonlinearsignalinclassificationproblemsusinggeneexpression
AT greenecaseys effectofnonlinearsignalinclassificationproblemsusinggeneexpression