Cargando…

Integrative disease classification based on cross-platform microarray data

BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared di...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Chun-Chi, Hu, Jianjun, Kalakrishnan, Mrinal, Huang, Haiyan, Zhou, Xianghong Jasmine
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648756/
https://www.ncbi.nlm.nih.gov/pubmed/19208125
http://dx.doi.org/10.1186/1471-2105-10-S1-S25
_version_ 1782164980850753536
author Liu, Chun-Chi
Hu, Jianjun
Kalakrishnan, Mrinal
Huang, Haiyan
Zhou, Xianghong Jasmine
author_facet Liu, Chun-Chi
Hu, Jianjun
Kalakrishnan, Mrinal
Huang, Haiyan
Zhou, Xianghong Jasmine
author_sort Liu, Chun-Chi
collection PubMed
description BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. RESULTS: In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%. CONCLUSION: Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data.
format Text
id pubmed-2648756
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26487562009-03-03 Integrative disease classification based on cross-platform microarray data Liu, Chun-Chi Hu, Jianjun Kalakrishnan, Mrinal Huang, Haiyan Zhou, Xianghong Jasmine BMC Bioinformatics Research BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. RESULTS: In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%. CONCLUSION: Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data. BioMed Central 2009-01-30 /pmc/articles/PMC2648756/ /pubmed/19208125 http://dx.doi.org/10.1186/1471-2105-10-S1-S25 Text en Copyright © 2009 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Liu, Chun-Chi
Hu, Jianjun
Kalakrishnan, Mrinal
Huang, Haiyan
Zhou, Xianghong Jasmine
Integrative disease classification based on cross-platform microarray data
title Integrative disease classification based on cross-platform microarray data
title_full Integrative disease classification based on cross-platform microarray data
title_fullStr Integrative disease classification based on cross-platform microarray data
title_full_unstemmed Integrative disease classification based on cross-platform microarray data
title_short Integrative disease classification based on cross-platform microarray data
title_sort integrative disease classification based on cross-platform microarray data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648756/
https://www.ncbi.nlm.nih.gov/pubmed/19208125
http://dx.doi.org/10.1186/1471-2105-10-S1-S25
work_keys_str_mv AT liuchunchi integrativediseaseclassificationbasedoncrossplatformmicroarraydata
AT hujianjun integrativediseaseclassificationbasedoncrossplatformmicroarraydata
AT kalakrishnanmrinal integrativediseaseclassificationbasedoncrossplatformmicroarraydata
AT huanghaiyan integrativediseaseclassificationbasedoncrossplatformmicroarraydata
AT zhouxianghongjasmine integrativediseaseclassificationbasedoncrossplatformmicroarraydata