Cargando…

A deep learning model to classify neoplastic state and tissue origin from transcriptomic data

Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomi...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, James, Hachem, Laureen D., Fehlings, Michael G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9188604/
https://www.ncbi.nlm.nih.gov/pubmed/35690622
http://dx.doi.org/10.1038/s41598-022-13665-5
Descripción
Sumario:Application of deep learning methods to transcriptomic data has the potential to enhance the accuracy and efficiency of tissue classification and cell state identification. Herein, we developed a multitask deep learning model for tissue classification combining publicly available whole transcriptomic (RNA-seq) datasets of non-neoplastic, neoplastic and peri-neoplastic tissue to classify disease state, tissue origin and neoplastic subclass. RNA-seq data from a total of 10,116 patient samples processed through a common pipeline were used for model training and validation. The model achieved 99% accuracy for disease state classification (ROC-AUC of 0.98) and 97% accuracy for tissue origin (ROC-AUC of 0.99). Moreover, the model achieved an accuracy of 92% (ROC-AUC 0.95) for neoplastic subclassification. This is the first multitask deep learning algorithm developed for tissue classification employing a uniform pipeline analysis of transcriptomic data with multiple tissue classifiers. This model serves as a framework for incorporating large transcriptomic datasets across conditions to facilitate clinical diagnosis and cell-based treatment strategies.