Cargando…

Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data

Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating...

Descripción completa

Detalles Bibliográficos
Autores principales: Quinn, Thomas P., Nguyen, Thin, Lee, Samuel C., Venkatesh, Svetha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6614188/
https://www.ncbi.nlm.nih.gov/pubmed/31312210
http://dx.doi.org/10.3389/fgene.2019.00599
_version_ 1783433143930322944
author Quinn, Thomas P.
Nguyen, Thin
Lee, Samuel C.
Venkatesh, Svetha
author_facet Quinn, Thomas P.
Nguyen, Thin
Lee, Samuel C.
Venkatesh, Svetha
author_sort Quinn, Thomas P.
collection PubMed
description Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating one group (“cases”) from another (“controls”). However, discriminatory methods can only identify cancerous samples that resemble those that the algorithm already saw during training. As such, discriminatory methods may be ill-suited for the classification of cancer: because the possibility space of cancer is definitively large, the existence of a one-of-a-kind gene expression signature is likely. Instead, we propose using an established surveillance method that detects anomalous samples based on their deviation from a learned normal steady-state structure. By transferring this method to transcriptomic data, we can create an anomaly detector for tissue transcriptomes, a “tissue detector,” that is capable of identifying cancer without ever seeing a single cancer example. As a proof-of-concept, we train a “tissue detector” on normal GTEx samples that can classify TCGA samples with >90% AUC for 3 out of 6 tissues. Importantly, we find that the classification accuracy is improved simply by adding more healthy samples. We conclude this report by emphasizing the conceptual advantages of anomaly detection and by highlighting future directions for this field of study.
format Online
Article
Text
id pubmed-6614188
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-66141882019-07-16 Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data Quinn, Thomas P. Nguyen, Thin Lee, Samuel C. Venkatesh, Svetha Front Genet Genetics Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating one group (“cases”) from another (“controls”). However, discriminatory methods can only identify cancerous samples that resemble those that the algorithm already saw during training. As such, discriminatory methods may be ill-suited for the classification of cancer: because the possibility space of cancer is definitively large, the existence of a one-of-a-kind gene expression signature is likely. Instead, we propose using an established surveillance method that detects anomalous samples based on their deviation from a learned normal steady-state structure. By transferring this method to transcriptomic data, we can create an anomaly detector for tissue transcriptomes, a “tissue detector,” that is capable of identifying cancer without ever seeing a single cancer example. As a proof-of-concept, we train a “tissue detector” on normal GTEx samples that can classify TCGA samples with >90% AUC for 3 out of 6 tissues. Importantly, we find that the classification accuracy is improved simply by adding more healthy samples. We conclude this report by emphasizing the conceptual advantages of anomaly detection and by highlighting future directions for this field of study. Frontiers Media S.A. 2019-07-02 /pmc/articles/PMC6614188/ /pubmed/31312210 http://dx.doi.org/10.3389/fgene.2019.00599 Text en Copyright © 2019 Quinn, Nguyen, Lee and Venkatesh. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Quinn, Thomas P.
Nguyen, Thin
Lee, Samuel C.
Venkatesh, Svetha
Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
title Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
title_full Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
title_fullStr Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
title_full_unstemmed Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
title_short Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
title_sort cancer as a tissue anomaly: classifying tumor transcriptomes based only on healthy data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6614188/
https://www.ncbi.nlm.nih.gov/pubmed/31312210
http://dx.doi.org/10.3389/fgene.2019.00599
work_keys_str_mv AT quinnthomasp cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata
AT nguyenthin cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata
AT leesamuelc cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata
AT venkateshsvetha cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata