Cargando…
Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data
Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6614188/ https://www.ncbi.nlm.nih.gov/pubmed/31312210 http://dx.doi.org/10.3389/fgene.2019.00599 |
_version_ | 1783433143930322944 |
---|---|
author | Quinn, Thomas P. Nguyen, Thin Lee, Samuel C. Venkatesh, Svetha |
author_facet | Quinn, Thomas P. Nguyen, Thin Lee, Samuel C. Venkatesh, Svetha |
author_sort | Quinn, Thomas P. |
collection | PubMed |
description | Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating one group (“cases”) from another (“controls”). However, discriminatory methods can only identify cancerous samples that resemble those that the algorithm already saw during training. As such, discriminatory methods may be ill-suited for the classification of cancer: because the possibility space of cancer is definitively large, the existence of a one-of-a-kind gene expression signature is likely. Instead, we propose using an established surveillance method that detects anomalous samples based on their deviation from a learned normal steady-state structure. By transferring this method to transcriptomic data, we can create an anomaly detector for tissue transcriptomes, a “tissue detector,” that is capable of identifying cancer without ever seeing a single cancer example. As a proof-of-concept, we train a “tissue detector” on normal GTEx samples that can classify TCGA samples with >90% AUC for 3 out of 6 tissues. Importantly, we find that the classification accuracy is improved simply by adding more healthy samples. We conclude this report by emphasizing the conceptual advantages of anomaly detection and by highlighting future directions for this field of study. |
format | Online Article Text |
id | pubmed-6614188 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-66141882019-07-16 Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data Quinn, Thomas P. Nguyen, Thin Lee, Samuel C. Venkatesh, Svetha Front Genet Genetics Since the turn of the century, researchers have sought to diagnose cancer based on gene expression signatures measured from the blood or biopsy as biomarkers. This task, known as classification, is typically solved using a suite of algorithms that learn a mathematical rule capable of discriminating one group (“cases”) from another (“controls”). However, discriminatory methods can only identify cancerous samples that resemble those that the algorithm already saw during training. As such, discriminatory methods may be ill-suited for the classification of cancer: because the possibility space of cancer is definitively large, the existence of a one-of-a-kind gene expression signature is likely. Instead, we propose using an established surveillance method that detects anomalous samples based on their deviation from a learned normal steady-state structure. By transferring this method to transcriptomic data, we can create an anomaly detector for tissue transcriptomes, a “tissue detector,” that is capable of identifying cancer without ever seeing a single cancer example. As a proof-of-concept, we train a “tissue detector” on normal GTEx samples that can classify TCGA samples with >90% AUC for 3 out of 6 tissues. Importantly, we find that the classification accuracy is improved simply by adding more healthy samples. We conclude this report by emphasizing the conceptual advantages of anomaly detection and by highlighting future directions for this field of study. Frontiers Media S.A. 2019-07-02 /pmc/articles/PMC6614188/ /pubmed/31312210 http://dx.doi.org/10.3389/fgene.2019.00599 Text en Copyright © 2019 Quinn, Nguyen, Lee and Venkatesh. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Quinn, Thomas P. Nguyen, Thin Lee, Samuel C. Venkatesh, Svetha Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data |
title | Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data |
title_full | Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data |
title_fullStr | Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data |
title_full_unstemmed | Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data |
title_short | Cancer as a Tissue Anomaly: Classifying Tumor Transcriptomes Based Only on Healthy Data |
title_sort | cancer as a tissue anomaly: classifying tumor transcriptomes based only on healthy data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6614188/ https://www.ncbi.nlm.nih.gov/pubmed/31312210 http://dx.doi.org/10.3389/fgene.2019.00599 |
work_keys_str_mv | AT quinnthomasp cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata AT nguyenthin cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata AT leesamuelc cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata AT venkateshsvetha cancerasatissueanomalyclassifyingtumortranscriptomesbasedonlyonhealthydata |