Cargando…
A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data
BACKGROUND: Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6567499/ https://www.ncbi.nlm.nih.gov/pubmed/31196002 http://dx.doi.org/10.1186/s12859-019-2922-2 |
_version_ | 1783427091809697792 |
---|---|
author | Roder, Joanna Oliveira, Carlos Net, Lelia Tsypin, Maxim Linstid, Benjamin Roder, Heinrich |
author_facet | Roder, Joanna Oliveira, Carlos Net, Lelia Tsypin, Maxim Linstid, Benjamin Roder, Heinrich |
author_sort | Roder, Joanna |
collection | PubMed |
description | BACKGROUND: Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization. RESULTS: We apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples. CONCLUSIONS: The flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2922-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6567499 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65674992019-06-17 A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data Roder, Joanna Oliveira, Carlos Net, Lelia Tsypin, Maxim Linstid, Benjamin Roder, Heinrich BMC Bioinformatics Methodology Article BACKGROUND: Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization. RESULTS: We apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples. CONCLUSIONS: The flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2922-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-13 /pmc/articles/PMC6567499/ /pubmed/31196002 http://dx.doi.org/10.1186/s12859-019-2922-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Roder, Joanna Oliveira, Carlos Net, Lelia Tsypin, Maxim Linstid, Benjamin Roder, Heinrich A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
title | A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
title_full | A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
title_fullStr | A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
title_full_unstemmed | A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
title_short | A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
title_sort | dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6567499/ https://www.ncbi.nlm.nih.gov/pubmed/31196002 http://dx.doi.org/10.1186/s12859-019-2922-2 |
work_keys_str_mv | AT roderjoanna adropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT oliveiracarlos adropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT netlelia adropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT tsypinmaxim adropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT linstidbenjamin adropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT roderheinrich adropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT roderjoanna dropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT oliveiracarlos dropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT netlelia dropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT tsypinmaxim dropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT linstidbenjamin dropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata AT roderheinrich dropoutregularizedclassifierdevelopmentapproachoptimizedforprecisionmedicinetestdiscoveryfromomicsdata |