Cargando…

Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas

Based on morphology it is often challenging to distinguish between the many different soft tissue sarcoma subtypes. Moreover, outcome of disease is highly variable even between patients with the same disease. Machine learning on transcriptome sequencing data could be a valuable new tool to understan...

Descripción completa

Detalles Bibliográficos
Autores principales: van IJzendoorn, David G. P., Szuhai, Karoly, Briaire-de Bruijn, Inge H., Kostine, Marie, Kuijjer, Marieke L., Bovée, Judith V. M. G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6398862/
https://www.ncbi.nlm.nih.gov/pubmed/30785874
http://dx.doi.org/10.1371/journal.pcbi.1006826
_version_ 1783399656454094848
author van IJzendoorn, David G. P.
Szuhai, Karoly
Briaire-de Bruijn, Inge H.
Kostine, Marie
Kuijjer, Marieke L.
Bovée, Judith V. M. G.
author_facet van IJzendoorn, David G. P.
Szuhai, Karoly
Briaire-de Bruijn, Inge H.
Kostine, Marie
Kuijjer, Marieke L.
Bovée, Judith V. M. G.
author_sort van IJzendoorn, David G. P.
collection PubMed
description Based on morphology it is often challenging to distinguish between the many different soft tissue sarcoma subtypes. Moreover, outcome of disease is highly variable even between patients with the same disease. Machine learning on transcriptome sequencing data could be a valuable new tool to understand differences between and within entities. Here we used machine learning analysis to identify novel diagnostic and prognostic markers and therapeutic targets for soft tissue sarcomas. Gene expression data was used from the Cancer Genome Atlas, the Genotype-Tissue Expression project and the French Sarcoma Group. We identified three groups of tumors that overlap in their molecular profiles as seen with unsupervised t-Distributed Stochastic Neighbor Embedding clustering and a deep neural network. The three groups corresponded to subtypes that are morphologically overlapping. Using a random forest algorithm, we identified novel diagnostic markers for soft tissue sarcoma that distinguished between synovial sarcoma and MPNST, and that we validated using qRT-PCR in an independent series. Next, we identified prognostic genes that are strong predictors of disease outcome when used in a k-nearest neighbor algorithm. The prognostic genes were further validated in expression data from the French Sarcoma Group. One of these, HMMR, was validated in an independent series of leiomyosarcomas using immunohistochemistry on tissue micro array as a prognostic gene for disease-free interval. Furthermore, reconstruction of regulatory networks combined with data from the Connectivity Map showed, amongst others, that HDAC inhibitors could be a potential effective therapy for multiple soft tissue sarcoma subtypes. A viability assay with two HDAC inhibitors confirmed that both leiomyosarcoma and synovial sarcoma are sensitive to HDAC inhibition. In this study we identified novel diagnostic markers, prognostic markers and therapeutic leads from multiple soft tissue sarcoma gene expression datasets. Thus, machine learning algorithms are powerful new tools to improve our understanding of rare tumor entities.
format Online
Article
Text
id pubmed-6398862
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63988622019-03-09 Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas van IJzendoorn, David G. P. Szuhai, Karoly Briaire-de Bruijn, Inge H. Kostine, Marie Kuijjer, Marieke L. Bovée, Judith V. M. G. PLoS Comput Biol Research Article Based on morphology it is often challenging to distinguish between the many different soft tissue sarcoma subtypes. Moreover, outcome of disease is highly variable even between patients with the same disease. Machine learning on transcriptome sequencing data could be a valuable new tool to understand differences between and within entities. Here we used machine learning analysis to identify novel diagnostic and prognostic markers and therapeutic targets for soft tissue sarcomas. Gene expression data was used from the Cancer Genome Atlas, the Genotype-Tissue Expression project and the French Sarcoma Group. We identified three groups of tumors that overlap in their molecular profiles as seen with unsupervised t-Distributed Stochastic Neighbor Embedding clustering and a deep neural network. The three groups corresponded to subtypes that are morphologically overlapping. Using a random forest algorithm, we identified novel diagnostic markers for soft tissue sarcoma that distinguished between synovial sarcoma and MPNST, and that we validated using qRT-PCR in an independent series. Next, we identified prognostic genes that are strong predictors of disease outcome when used in a k-nearest neighbor algorithm. The prognostic genes were further validated in expression data from the French Sarcoma Group. One of these, HMMR, was validated in an independent series of leiomyosarcomas using immunohistochemistry on tissue micro array as a prognostic gene for disease-free interval. Furthermore, reconstruction of regulatory networks combined with data from the Connectivity Map showed, amongst others, that HDAC inhibitors could be a potential effective therapy for multiple soft tissue sarcoma subtypes. A viability assay with two HDAC inhibitors confirmed that both leiomyosarcoma and synovial sarcoma are sensitive to HDAC inhibition. In this study we identified novel diagnostic markers, prognostic markers and therapeutic leads from multiple soft tissue sarcoma gene expression datasets. Thus, machine learning algorithms are powerful new tools to improve our understanding of rare tumor entities. Public Library of Science 2019-02-20 /pmc/articles/PMC6398862/ /pubmed/30785874 http://dx.doi.org/10.1371/journal.pcbi.1006826 Text en © 2019 van IJzendoorn et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
van IJzendoorn, David G. P.
Szuhai, Karoly
Briaire-de Bruijn, Inge H.
Kostine, Marie
Kuijjer, Marieke L.
Bovée, Judith V. M. G.
Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
title Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
title_full Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
title_fullStr Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
title_full_unstemmed Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
title_short Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
title_sort machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6398862/
https://www.ncbi.nlm.nih.gov/pubmed/30785874
http://dx.doi.org/10.1371/journal.pcbi.1006826
work_keys_str_mv AT vanijzendoorndavidgp machinelearninganalysisofgeneexpressiondatarevealsnoveldiagnosticandprognosticbiomarkersandidentifiestherapeutictargetsforsofttissuesarcomas
AT szuhaikaroly machinelearninganalysisofgeneexpressiondatarevealsnoveldiagnosticandprognosticbiomarkersandidentifiestherapeutictargetsforsofttissuesarcomas
AT briairedebruijningeh machinelearninganalysisofgeneexpressiondatarevealsnoveldiagnosticandprognosticbiomarkersandidentifiestherapeutictargetsforsofttissuesarcomas
AT kostinemarie machinelearninganalysisofgeneexpressiondatarevealsnoveldiagnosticandprognosticbiomarkersandidentifiestherapeutictargetsforsofttissuesarcomas
AT kuijjermariekel machinelearninganalysisofgeneexpressiondatarevealsnoveldiagnosticandprognosticbiomarkersandidentifiestherapeutictargetsforsofttissuesarcomas
AT boveejudithvmg machinelearninganalysisofgeneexpressiondatarevealsnoveldiagnosticandprognosticbiomarkersandidentifiestherapeutictargetsforsofttissuesarcomas