Cargando…

Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets....

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Jonathan Z.L., Mucaki, Eliseos J., Rogan, Peter K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5981198/
https://www.ncbi.nlm.nih.gov/pubmed/29904591
http://dx.doi.org/10.12688/f1000research.14048.2
_version_ 1783327995812904960
author Zhao, Jonathan Z.L.
Mucaki, Eliseos J.
Rogan, Peter K.
author_facet Zhao, Jonathan Z.L.
Mucaki, Eliseos J.
Rogan, Peter K.
author_sort Zhao, Jonathan Z.L.
collection PubMed
description Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.
format Online
Article
Text
id pubmed-5981198
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-59811982018-06-13 Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning Zhao, Jonathan Z.L. Mucaki, Eliseos J. Rogan, Peter K. F1000Res Research Article Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation. F1000 Research Limited 2018-06-15 /pmc/articles/PMC5981198/ /pubmed/29904591 http://dx.doi.org/10.12688/f1000research.14048.2 Text en Copyright: © 2018 Zhao JZL et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhao, Jonathan Z.L.
Mucaki, Eliseos J.
Rogan, Peter K.
Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
title Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
title_full Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
title_fullStr Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
title_full_unstemmed Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
title_short Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
title_sort predicting ionizing radiation exposure using biochemically-inspired genomic machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5981198/
https://www.ncbi.nlm.nih.gov/pubmed/29904591
http://dx.doi.org/10.12688/f1000research.14048.2
work_keys_str_mv AT zhaojonathanzl predictingionizingradiationexposureusingbiochemicallyinspiredgenomicmachinelearning
AT mucakieliseosj predictingionizingradiationexposureusingbiochemicallyinspiredgenomicmachinelearning
AT roganpeterk predictingionizingradiationexposureusingbiochemicallyinspiredgenomicmachinelearning