Cargando…

Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms

BACKGROUND: Identification of Hürthle cell cancers by non-operative fine-needle aspiration biopsy (FNAB) of thyroid nodules is challenging. Resultingly, non-cancerous Hürthle lesions were conventionally distinguished from Hürthle cell cancers by histopathological examination of tissue following surg...

Descripción completa

Detalles Bibliográficos
Autores principales: Hao, Yangyang, Duh, Quan-Yang, Kloos, Richard T., Babiarz, Joshua, Harrell, R. Mack, Traweek, S. Thomas, Kim, Su Yeon, Fedorowicz, Grazyna, Walsh, P. Sean, Sadow, Peter M., Huang, Jing, Kennedy, Giulia C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450053/
https://www.ncbi.nlm.nih.gov/pubmed/30952205
http://dx.doi.org/10.1186/s12918-019-0693-z
_version_ 1783408974623670272
author Hao, Yangyang
Duh, Quan-Yang
Kloos, Richard T.
Babiarz, Joshua
Harrell, R. Mack
Traweek, S. Thomas
Kim, Su Yeon
Fedorowicz, Grazyna
Walsh, P. Sean
Sadow, Peter M.
Huang, Jing
Kennedy, Giulia C.
author_facet Hao, Yangyang
Duh, Quan-Yang
Kloos, Richard T.
Babiarz, Joshua
Harrell, R. Mack
Traweek, S. Thomas
Kim, Su Yeon
Fedorowicz, Grazyna
Walsh, P. Sean
Sadow, Peter M.
Huang, Jing
Kennedy, Giulia C.
author_sort Hao, Yangyang
collection PubMed
description BACKGROUND: Identification of Hürthle cell cancers by non-operative fine-needle aspiration biopsy (FNAB) of thyroid nodules is challenging. Resultingly, non-cancerous Hürthle lesions were conventionally distinguished from Hürthle cell cancers by histopathological examination of tissue following surgical resection. Reliance on histopathological evaluation requires patients to undergo surgery to obtain a diagnosis despite most being non-cancerous. It is highly desirable to avoid surgery and to provide accurate classification of benignity versus malignancy from FNAB preoperatively. In our first-generation algorithm, Gene Expression Classifier (GEC), we achieved this goal by using machine learning (ML) on gene expression features. The classifier is sensitive, but not specific due in part to the presence of non-neoplastic benign Hürthle cells in many FNAB. RESULTS: We sought to overcome this low-specificity limitation by expanding the feature set for ML using next-generation whole transcriptome RNA sequencing and called the improved algorithm the Genomic Sequencing Classifier (GSC). The Hürthle identification leverages mitochondrial expression and we developed novel feature extraction mechanisms to measure chromosomal and genomic level loss-of-heterozygosity (LOH) for the algorithm. Additionally, we developed a multi-layered system of cascading classifiers to sequentially triage Hürthle cell-containing FNAB, including: 1. presence of Hürthle cells, 2. presence of neoplastic Hürthle cells, and 3. presence of benign Hürthle cells. The final Hürthle cell Index utilizes 1048 nuclear and mitochondrial genes; and Hürthle cell Neoplasm Index leverages LOH features as well as 2041 genes. Both indices are Support Vector Machine (SVM) based. The third classifier, the GSC Benign/Suspicious classifier, utilizes 1115 core genes and is an ensemble classifier incorporating 12 individual models. CONCLUSIONS: The accurate algorithmic depiction of this complex biological system among Hürthle subtypes results in a dramatic improvement of classification performance; specificity among Hürthle cell neoplasms increases from 11.8% with the GEC to 58.8% with the GSC, while maintaining the same sensitivity of 89%. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12918-019-0693-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6450053
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64500532019-04-16 Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms Hao, Yangyang Duh, Quan-Yang Kloos, Richard T. Babiarz, Joshua Harrell, R. Mack Traweek, S. Thomas Kim, Su Yeon Fedorowicz, Grazyna Walsh, P. Sean Sadow, Peter M. Huang, Jing Kennedy, Giulia C. BMC Syst Biol Research BACKGROUND: Identification of Hürthle cell cancers by non-operative fine-needle aspiration biopsy (FNAB) of thyroid nodules is challenging. Resultingly, non-cancerous Hürthle lesions were conventionally distinguished from Hürthle cell cancers by histopathological examination of tissue following surgical resection. Reliance on histopathological evaluation requires patients to undergo surgery to obtain a diagnosis despite most being non-cancerous. It is highly desirable to avoid surgery and to provide accurate classification of benignity versus malignancy from FNAB preoperatively. In our first-generation algorithm, Gene Expression Classifier (GEC), we achieved this goal by using machine learning (ML) on gene expression features. The classifier is sensitive, but not specific due in part to the presence of non-neoplastic benign Hürthle cells in many FNAB. RESULTS: We sought to overcome this low-specificity limitation by expanding the feature set for ML using next-generation whole transcriptome RNA sequencing and called the improved algorithm the Genomic Sequencing Classifier (GSC). The Hürthle identification leverages mitochondrial expression and we developed novel feature extraction mechanisms to measure chromosomal and genomic level loss-of-heterozygosity (LOH) for the algorithm. Additionally, we developed a multi-layered system of cascading classifiers to sequentially triage Hürthle cell-containing FNAB, including: 1. presence of Hürthle cells, 2. presence of neoplastic Hürthle cells, and 3. presence of benign Hürthle cells. The final Hürthle cell Index utilizes 1048 nuclear and mitochondrial genes; and Hürthle cell Neoplasm Index leverages LOH features as well as 2041 genes. Both indices are Support Vector Machine (SVM) based. The third classifier, the GSC Benign/Suspicious classifier, utilizes 1115 core genes and is an ensemble classifier incorporating 12 individual models. CONCLUSIONS: The accurate algorithmic depiction of this complex biological system among Hürthle subtypes results in a dramatic improvement of classification performance; specificity among Hürthle cell neoplasms increases from 11.8% with the GEC to 58.8% with the GSC, while maintaining the same sensitivity of 89%. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12918-019-0693-z) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-05 /pmc/articles/PMC6450053/ /pubmed/30952205 http://dx.doi.org/10.1186/s12918-019-0693-z Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Hao, Yangyang
Duh, Quan-Yang
Kloos, Richard T.
Babiarz, Joshua
Harrell, R. Mack
Traweek, S. Thomas
Kim, Su Yeon
Fedorowicz, Grazyna
Walsh, P. Sean
Sadow, Peter M.
Huang, Jing
Kennedy, Giulia C.
Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
title Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
title_full Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
title_fullStr Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
title_full_unstemmed Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
title_short Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
title_sort identification of hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450053/
https://www.ncbi.nlm.nih.gov/pubmed/30952205
http://dx.doi.org/10.1186/s12918-019-0693-z
work_keys_str_mv AT haoyangyang identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT duhquanyang identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT kloosrichardt identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT babiarzjoshua identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT harrellrmack identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT traweeksthomas identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT kimsuyeon identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT fedorowiczgrazyna identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT walshpsean identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT sadowpeterm identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT huangjing identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms
AT kennedygiuliac identificationofhurthlecellcancerssolvingaclinicalchallengewithgenomicsequencingandatrioofmachinelearningalgorithms