Cargando…

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning

The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we intr...

Descripción completa

Detalles Bibliográficos
Autores principales: Harris, Cole, Ghaffari, Noushin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559897/
https://www.ncbi.nlm.nih.gov/pubmed/18831798
http://dx.doi.org/10.1186/1471-2164-9-S2-S7
_version_ 1782159689258106880
author Harris, Cole
Ghaffari, Noushin
author_facet Harris, Cole
Ghaffari, Noushin
author_sort Harris, Cole
collection PubMed
description The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we introduce GLAD, a new Semi-Supervised Learning (SSL) method for combining independent annotated datasets and unannotated datasets with the aim of identifying more robust sample classifiers. In our method, independent models are developed using subsets of genes for the annotated and unannotated datasets. These models are evaluated according to a scoring function that incorporates terms for classification accuracy on annotated data, and relative cluster separation in unannotated data. Improved models are iteratively generated using a genetic algorithm feature selection technique. Our results show that the addition of unannotated data into training, significantly improves classifier robustness.
format Text
id pubmed-2559897
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25598972008-10-04 Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning Harris, Cole Ghaffari, Noushin BMC Genomics Research The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we introduce GLAD, a new Semi-Supervised Learning (SSL) method for combining independent annotated datasets and unannotated datasets with the aim of identifying more robust sample classifiers. In our method, independent models are developed using subsets of genes for the annotated and unannotated datasets. These models are evaluated according to a scoring function that incorporates terms for classification accuracy on annotated data, and relative cluster separation in unannotated data. Improved models are iteratively generated using a genetic algorithm feature selection technique. Our results show that the addition of unannotated data into training, significantly improves classifier robustness. BioMed Central 2008-09-16 /pmc/articles/PMC2559897/ /pubmed/18831798 http://dx.doi.org/10.1186/1471-2164-9-S2-S7 Text en Copyright © 2008 Harris and Ghaffari; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Harris, Cole
Ghaffari, Noushin
Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
title Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
title_full Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
title_fullStr Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
title_full_unstemmed Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
title_short Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
title_sort biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559897/
https://www.ncbi.nlm.nih.gov/pubmed/18831798
http://dx.doi.org/10.1186/1471-2164-9-S2-S7
work_keys_str_mv AT harriscole biomarkerdiscoveryacrossannotatedandunannotatedmicroarraydatasetsusingsemisupervisedlearning
AT ghaffarinoushin biomarkerdiscoveryacrossannotatedandunannotatedmicroarraydatasetsusingsemisupervisedlearning