Cargando…
Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5981198/ https://www.ncbi.nlm.nih.gov/pubmed/29904591 http://dx.doi.org/10.12688/f1000research.14048.2 |
_version_ | 1783327995812904960 |
---|---|
author | Zhao, Jonathan Z.L. Mucaki, Eliseos J. Rogan, Peter K. |
author_facet | Zhao, Jonathan Z.L. Mucaki, Eliseos J. Rogan, Peter K. |
author_sort | Zhao, Jonathan Z.L. |
collection | PubMed |
description | Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2, PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2, CD8A, TALDO1, PCNA, EIF4G2, LCN2, CDKN1A, PRKCH, ENO1, and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation. |
format | Online Article Text |
id | pubmed-5981198 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-59811982018-06-13 Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning Zhao, Jonathan Z.L. Mucaki, Eliseos J. Rogan, Peter K. F1000Res Research Article Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2, PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2, CD8A, TALDO1, PCNA, EIF4G2, LCN2, CDKN1A, PRKCH, ENO1, and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation. F1000 Research Limited 2018-06-15 /pmc/articles/PMC5981198/ /pubmed/29904591 http://dx.doi.org/10.12688/f1000research.14048.2 Text en Copyright: © 2018 Zhao JZL et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhao, Jonathan Z.L. Mucaki, Eliseos J. Rogan, Peter K. Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
title | Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
title_full | Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
title_fullStr | Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
title_full_unstemmed | Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
title_short | Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
title_sort | predicting ionizing radiation exposure using biochemically-inspired genomic machine learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5981198/ https://www.ncbi.nlm.nih.gov/pubmed/29904591 http://dx.doi.org/10.12688/f1000research.14048.2 |
work_keys_str_mv | AT zhaojonathanzl predictingionizingradiationexposureusingbiochemicallyinspiredgenomicmachinelearning AT mucakieliseosj predictingionizingradiationexposureusingbiochemicallyinspiredgenomicmachinelearning AT roganpeterk predictingionizingradiationexposureusingbiochemicallyinspiredgenomicmachinelearning |