Cargando…

Data-driven characterization of molecular phenotypes across heterogeneous sample collections

Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results...

Descripción completa

Detalles Bibliográficos
Autores principales: Mehtonen, Juha, Pölönen, Petri, Häyrynen, Sergei, Dufva, Olli, Lin, Jake, Liuksiala, Thomas, Granberg, Kirsi, Lohi, Olli, Hautamäki, Ville, Nykter, Matti, Heinäniemi, Merja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6648337/
https://www.ncbi.nlm.nih.gov/pubmed/31329928
http://dx.doi.org/10.1093/nar/gkz281
_version_ 1783437846080651264
author Mehtonen, Juha
Pölönen, Petri
Häyrynen, Sergei
Dufva, Olli
Lin, Jake
Liuksiala, Thomas
Granberg, Kirsi
Lohi, Olli
Hautamäki, Ville
Nykter, Matti
Heinäniemi, Merja
author_facet Mehtonen, Juha
Pölönen, Petri
Häyrynen, Sergei
Dufva, Olli
Lin, Jake
Liuksiala, Thomas
Granberg, Kirsi
Lohi, Olli
Hautamäki, Ville
Nykter, Matti
Heinäniemi, Merja
author_sort Mehtonen, Juha
collection PubMed
description Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.
format Online
Article
Text
id pubmed-6648337
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66483372019-07-29 Data-driven characterization of molecular phenotypes across heterogeneous sample collections Mehtonen, Juha Pölönen, Petri Häyrynen, Sergei Dufva, Olli Lin, Jake Liuksiala, Thomas Granberg, Kirsi Lohi, Olli Hautamäki, Ville Nykter, Matti Heinäniemi, Merja Nucleic Acids Res Methods Online Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology. Oxford University Press 2019-07-26 2019-04-24 /pmc/articles/PMC6648337/ /pubmed/31329928 http://dx.doi.org/10.1093/nar/gkz281 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Mehtonen, Juha
Pölönen, Petri
Häyrynen, Sergei
Dufva, Olli
Lin, Jake
Liuksiala, Thomas
Granberg, Kirsi
Lohi, Olli
Hautamäki, Ville
Nykter, Matti
Heinäniemi, Merja
Data-driven characterization of molecular phenotypes across heterogeneous sample collections
title Data-driven characterization of molecular phenotypes across heterogeneous sample collections
title_full Data-driven characterization of molecular phenotypes across heterogeneous sample collections
title_fullStr Data-driven characterization of molecular phenotypes across heterogeneous sample collections
title_full_unstemmed Data-driven characterization of molecular phenotypes across heterogeneous sample collections
title_short Data-driven characterization of molecular phenotypes across heterogeneous sample collections
title_sort data-driven characterization of molecular phenotypes across heterogeneous sample collections
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6648337/
https://www.ncbi.nlm.nih.gov/pubmed/31329928
http://dx.doi.org/10.1093/nar/gkz281
work_keys_str_mv AT mehtonenjuha datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT polonenpetri datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT hayrynensergei datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT dufvaolli datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT linjake datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT liuksialathomas datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT granbergkirsi datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT lohiolli datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT hautamakiville datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT nyktermatti datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections
AT heinaniemimerja datadrivencharacterizationofmolecularphenotypesacrossheterogeneoussamplecollections