Cargando…
Integrative disease classification based on cross-platform microarray data
BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared di...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648756/ https://www.ncbi.nlm.nih.gov/pubmed/19208125 http://dx.doi.org/10.1186/1471-2105-10-S1-S25 |
_version_ | 1782164980850753536 |
---|---|
author | Liu, Chun-Chi Hu, Jianjun Kalakrishnan, Mrinal Huang, Haiyan Zhou, Xianghong Jasmine |
author_facet | Liu, Chun-Chi Hu, Jianjun Kalakrishnan, Mrinal Huang, Haiyan Zhou, Xianghong Jasmine |
author_sort | Liu, Chun-Chi |
collection | PubMed |
description | BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. RESULTS: In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%. CONCLUSION: Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data. |
format | Text |
id | pubmed-2648756 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26487562009-03-03 Integrative disease classification based on cross-platform microarray data Liu, Chun-Chi Hu, Jianjun Kalakrishnan, Mrinal Huang, Haiyan Zhou, Xianghong Jasmine BMC Bioinformatics Research BACKGROUND: Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. RESULTS: In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS), permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall) with many disease classes achieving the accuracy higher than 80%. CONCLUSION: Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data. BioMed Central 2009-01-30 /pmc/articles/PMC2648756/ /pubmed/19208125 http://dx.doi.org/10.1186/1471-2105-10-S1-S25 Text en Copyright © 2009 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Liu, Chun-Chi Hu, Jianjun Kalakrishnan, Mrinal Huang, Haiyan Zhou, Xianghong Jasmine Integrative disease classification based on cross-platform microarray data |
title | Integrative disease classification based on cross-platform microarray data |
title_full | Integrative disease classification based on cross-platform microarray data |
title_fullStr | Integrative disease classification based on cross-platform microarray data |
title_full_unstemmed | Integrative disease classification based on cross-platform microarray data |
title_short | Integrative disease classification based on cross-platform microarray data |
title_sort | integrative disease classification based on cross-platform microarray data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648756/ https://www.ncbi.nlm.nih.gov/pubmed/19208125 http://dx.doi.org/10.1186/1471-2105-10-S1-S25 |
work_keys_str_mv | AT liuchunchi integrativediseaseclassificationbasedoncrossplatformmicroarraydata AT hujianjun integrativediseaseclassificationbasedoncrossplatformmicroarraydata AT kalakrishnanmrinal integrativediseaseclassificationbasedoncrossplatformmicroarraydata AT huanghaiyan integrativediseaseclassificationbasedoncrossplatformmicroarraydata AT zhouxianghongjasmine integrativediseaseclassificationbasedoncrossplatformmicroarraydata |