Cargando…

Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data

Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of label...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, George, Romo Bucheli, David Edmundo, Madabhushi, Anant
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4946789/
https://www.ncbi.nlm.nih.gov/pubmed/27421116
http://dx.doi.org/10.1371/journal.pone.0159088
_version_ 1782443073398112256
author Lee, George
Romo Bucheli, David Edmundo
Madabhushi, Anant
author_facet Lee, George
Romo Bucheli, David Edmundo
Madabhushi, Anant
author_sort Lee, George
collection PubMed
description Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of labelled training data due to the expense of consulting highly specific medical expertise necessary to assess each study. Currently, no approach that we are aware of has attempted to use active learning in the context of dimensionality reduction approaches for improving the construction of low dimensional representations. We present our novel methodology, AdDReSS (Adaptive Dimensionality Reduction with Semi-Supervision), to demonstrate that fewer labeled instances identified via AL in embedding space are needed for creating a more discriminative embedding representation compared to randomly selected instances. We tested our methodology on a wide variety of domains ranging from prostate gene expression, ovarian proteomic spectra, brain magnetic resonance imaging, and breast histopathology. Across these various high dimensional biomedical datasets with 100+ observations each and all parameters considered, the median classification accuracy across all experiments showed AdDReSS (88.7%) to outperform SSAGE, a SSDR method using random sampling (85.5%), and Graph Embedding (81.5%). Furthermore, we found that embeddings generated via AdDReSS achieved a mean 35.95% improvement in Raghavan efficiency, a measure of learning rate, over SSAGE. Our results demonstrate the value of AdDReSS to provide low dimensional representations of high dimensional biomedical data while achieving higher classification rates with fewer labelled examples as compared to without active learning.
format Online
Article
Text
id pubmed-4946789
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49467892016-08-08 Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data Lee, George Romo Bucheli, David Edmundo Madabhushi, Anant PLoS One Research Article Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of labelled training data due to the expense of consulting highly specific medical expertise necessary to assess each study. Currently, no approach that we are aware of has attempted to use active learning in the context of dimensionality reduction approaches for improving the construction of low dimensional representations. We present our novel methodology, AdDReSS (Adaptive Dimensionality Reduction with Semi-Supervision), to demonstrate that fewer labeled instances identified via AL in embedding space are needed for creating a more discriminative embedding representation compared to randomly selected instances. We tested our methodology on a wide variety of domains ranging from prostate gene expression, ovarian proteomic spectra, brain magnetic resonance imaging, and breast histopathology. Across these various high dimensional biomedical datasets with 100+ observations each and all parameters considered, the median classification accuracy across all experiments showed AdDReSS (88.7%) to outperform SSAGE, a SSDR method using random sampling (85.5%), and Graph Embedding (81.5%). Furthermore, we found that embeddings generated via AdDReSS achieved a mean 35.95% improvement in Raghavan efficiency, a measure of learning rate, over SSAGE. Our results demonstrate the value of AdDReSS to provide low dimensional representations of high dimensional biomedical data while achieving higher classification rates with fewer labelled examples as compared to without active learning. Public Library of Science 2016-07-15 /pmc/articles/PMC4946789/ /pubmed/27421116 http://dx.doi.org/10.1371/journal.pone.0159088 Text en © 2016 Lee et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lee, George
Romo Bucheli, David Edmundo
Madabhushi, Anant
Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data
title Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data
title_full Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data
title_fullStr Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data
title_full_unstemmed Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data
title_short Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data
title_sort adaptive dimensionality reduction with semi-supervision (address): classifying multi-attribute biomedical data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4946789/
https://www.ncbi.nlm.nih.gov/pubmed/27421116
http://dx.doi.org/10.1371/journal.pone.0159088
work_keys_str_mv AT leegeorge adaptivedimensionalityreductionwithsemisupervisionaddressclassifyingmultiattributebiomedicaldata
AT romobuchelidavidedmundo adaptivedimensionalityreductionwithsemisupervisionaddressclassifyingmultiattributebiomedicaldata
AT madabhushianant adaptivedimensionalityreductionwithsemisupervisionaddressclassifyingmultiattributebiomedicaldata