Cargando…

Selecting precise reference normal tissue samples for cancer research using a deep learning approach

BACKGROUND: Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide matched t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zeng, William Z. D., Glicksberg, Benjamin S., Li, Yangyan, Chen, Bin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357350/ https://www.ncbi.nlm.nih.gov/pubmed/30704474 http://dx.doi.org/10.1186/s12920-018-0463-6

_version_	1783391766238461952
author	Zeng, William Z. D. Glicksberg, Benjamin S. Li, Yangyan Chen, Bin
author_facet	Zeng, William Z. D. Glicksberg, Benjamin S. Li, Yangyan Chen, Bin
author_sort	Zeng, William Z. D.
collection	PubMed
description	BACKGROUND: Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide matched tissue samples for every cancer or cancer subtype. The recent GTEx project has profiled samples from healthy individuals, providing an excellent resource for this field, yet the feasibility of using GTEx samples as the reference remains unanswered. METHODS: We analyze RNA-Seq data processed from the same computational pipeline and systematically evaluate GTEx as a potential reference resource. We use those cancers that have adjacent normal tissues in TCGA as a benchmark for the evaluation. To correlate tumor samples and normal samples, we explore top varying genes, reduced features from principal component analysis, and encoded features from an autoencoder neural network. We first evaluate whether these methods can identify the correct tissue of origin from GTEx for a given cancer and then seek to answer whether disease expression signatures are consistent between those derived from TCGA and from GTEx. RESULTS: Among 32 TCGA cancers, 18 cancers have less than 10 matched adjacent normal tissue samples. Among three methods, autoencoder performed the best in predicting tissue of origin, with 12 of 14 cancers correctly predicted. The reason for misclassification of two cancers is that none of normal samples from GTEx correlate well with any tumor samples in these cancers. This suggests that GTEx has matched tissues for the majority cancers, but not all. While using autoencoder to select proper normal samples for disease signature creation, we found that disease signatures derived from normal samples selected via an autoencoder from GTEx are consistent with those derived from adjacent samples from TCGA in many cases. Interestingly, choosing top 50 mostly correlated samples regardless of tissue type performed reasonably well or even better in some cancers. CONCLUSIONS: Our findings demonstrate that samples from GTEx can serve as reference normal samples for cancers, especially those do not have available adjacent tissue samples. A deep-learning based approach holds promise to select proper normal samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0463-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6357350
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63573502019-02-07 Selecting precise reference normal tissue samples for cancer research using a deep learning approach Zeng, William Z. D. Glicksberg, Benjamin S. Li, Yangyan Chen, Bin BMC Med Genomics Research BACKGROUND: Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide matched tissue samples for every cancer or cancer subtype. The recent GTEx project has profiled samples from healthy individuals, providing an excellent resource for this field, yet the feasibility of using GTEx samples as the reference remains unanswered. METHODS: We analyze RNA-Seq data processed from the same computational pipeline and systematically evaluate GTEx as a potential reference resource. We use those cancers that have adjacent normal tissues in TCGA as a benchmark for the evaluation. To correlate tumor samples and normal samples, we explore top varying genes, reduced features from principal component analysis, and encoded features from an autoencoder neural network. We first evaluate whether these methods can identify the correct tissue of origin from GTEx for a given cancer and then seek to answer whether disease expression signatures are consistent between those derived from TCGA and from GTEx. RESULTS: Among 32 TCGA cancers, 18 cancers have less than 10 matched adjacent normal tissue samples. Among three methods, autoencoder performed the best in predicting tissue of origin, with 12 of 14 cancers correctly predicted. The reason for misclassification of two cancers is that none of normal samples from GTEx correlate well with any tumor samples in these cancers. This suggests that GTEx has matched tissues for the majority cancers, but not all. While using autoencoder to select proper normal samples for disease signature creation, we found that disease signatures derived from normal samples selected via an autoencoder from GTEx are consistent with those derived from adjacent samples from TCGA in many cases. Interestingly, choosing top 50 mostly correlated samples regardless of tissue type performed reasonably well or even better in some cancers. CONCLUSIONS: Our findings demonstrate that samples from GTEx can serve as reference normal samples for cancers, especially those do not have available adjacent tissue samples. A deep-learning based approach holds promise to select proper normal samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0463-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-31 /pmc/articles/PMC6357350/ /pubmed/30704474 http://dx.doi.org/10.1186/s12920-018-0463-6 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Zeng, William Z. D. Glicksberg, Benjamin S. Li, Yangyan Chen, Bin Selecting precise reference normal tissue samples for cancer research using a deep learning approach
title	Selecting precise reference normal tissue samples for cancer research using a deep learning approach
title_full	Selecting precise reference normal tissue samples for cancer research using a deep learning approach
title_fullStr	Selecting precise reference normal tissue samples for cancer research using a deep learning approach
title_full_unstemmed	Selecting precise reference normal tissue samples for cancer research using a deep learning approach
title_short	Selecting precise reference normal tissue samples for cancer research using a deep learning approach
title_sort	selecting precise reference normal tissue samples for cancer research using a deep learning approach
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357350/ https://www.ncbi.nlm.nih.gov/pubmed/30704474 http://dx.doi.org/10.1186/s12920-018-0463-6
work_keys_str_mv	AT zengwilliamzd selectingprecisereferencenormaltissuesamplesforcancerresearchusingadeeplearningapproach AT glicksbergbenjamins selectingprecisereferencenormaltissuesamplesforcancerresearchusingadeeplearningapproach AT liyangyan selectingprecisereferencenormaltissuesamplesforcancerresearchusingadeeplearningapproach AT chenbin selectingprecisereferencenormaltissuesamplesforcancerresearchusingadeeplearningapproach

Selecting precise reference normal tissue samples for cancer research using a deep learning approach

Ejemplares similares