Cargando…

Automated deconvolution of structured mixtures from heterogeneous tumor genomic data

With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at presen...

Descripción completa

Detalles Bibliográficos
Autores principales: Roman, Theodore, Xie, Lu, Schwartz, Russell
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5695636/
https://www.ncbi.nlm.nih.gov/pubmed/29059177
http://dx.doi.org/10.1371/journal.pcbi.1005815
_version_ 1783280345850839040
author Roman, Theodore
Xie, Lu
Schwartz, Russell
author_facet Roman, Theodore
Xie, Lu
Schwartz, Russell
author_sort Roman, Theodore
collection PubMed
description With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix
format Online
Article
Text
id pubmed-5695636
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56956362017-11-30 Automated deconvolution of structured mixtures from heterogeneous tumor genomic data Roman, Theodore Xie, Lu Schwartz, Russell PLoS Comput Biol Research Article With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix Public Library of Science 2017-10-23 /pmc/articles/PMC5695636/ /pubmed/29059177 http://dx.doi.org/10.1371/journal.pcbi.1005815 Text en © 2017 Roman et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Roman, Theodore
Xie, Lu
Schwartz, Russell
Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
title Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
title_full Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
title_fullStr Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
title_full_unstemmed Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
title_short Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
title_sort automated deconvolution of structured mixtures from heterogeneous tumor genomic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5695636/
https://www.ncbi.nlm.nih.gov/pubmed/29059177
http://dx.doi.org/10.1371/journal.pcbi.1005815
work_keys_str_mv AT romantheodore automateddeconvolutionofstructuredmixturesfromheterogeneoustumorgenomicdata
AT xielu automateddeconvolutionofstructuredmixturesfromheterogeneoustumorgenomicdata
AT schwartzrussell automateddeconvolutionofstructuredmixturesfromheterogeneoustumorgenomicdata