Cargando…

Technology of Informative Feature Selection for Immunosignature Analysis

The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysi...

Descripción completa

Detalles Bibliográficos
Autores principales: Koshechkin, A.A., Romanovich, O.V., Stamate, D., Johnston, S.A., Zamyatin, A.V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Privolzhsky Research Medical University 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596259/
https://www.ncbi.nlm.nih.gov/pubmed/34796001
http://dx.doi.org/10.17691/stm2020.12.5.02
_version_ 1784600325401870336
author Koshechkin, A.A.
Romanovich, O.V.
Stamate, D.
Johnston, S.A.
Zamyatin, A.V.
author_facet Koshechkin, A.A.
Romanovich, O.V.
Stamate, D.
Johnston, S.A.
Zamyatin, A.V.
author_sort Koshechkin, A.A.
collection PubMed
description The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. The aim of the study is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained. MATERIALS AND METHODS. The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis. The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the “one vs all” strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets “one” and “all”; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison. To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model. RESULTS. Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%. CONCLUSION. The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis.
format Online
Article
Text
id pubmed-8596259
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Privolzhsky Research Medical University
record_format MEDLINE/PubMed
spelling pubmed-85962592021-11-17 Technology of Informative Feature Selection for Immunosignature Analysis Koshechkin, A.A. Romanovich, O.V. Stamate, D. Johnston, S.A. Zamyatin, A.V. Sovrem Tekhnologii Med Advanced Researches The main difficulty in practical work with data obtained via immunosignature analysis is high dimensionality and the presence of a significant number of uninformative or false-informative features due to the specific character of the technology. To ensure practically relevant quality of data analysis and classification, it is necessary to take due account of this specific character. The aim of the study is to create and test the technology for effective reduction of immunosignature data dimensionality, which provides practically relevant and high quality of classification with due regard for the properties of the data obtained. MATERIALS AND METHODS. The study involved the use of two normalized data sets obtained from the public biomedical repository and containing the results of immunosignature analysis. The technology for selecting informative features was proposed within the framework of the study. It consisted of three successive steps: 1) breaking a multiclass task into a series of binary tasks using the “one vs all” strategy; 2) screening of false-informative features is performed for each binary comparison by comparing the values of the median of the sets “one” and “all”; 3) ranking of the remaining features according to their informative value and selection of the most informative ones for each binary comparison. To assess the quality of the proposed technology for informative feature selection, we used the results obtained after application of classification based on the filtered data. Support vector method that proved itself in the problems of high-dimensional data classification was used as a classification model. RESULTS. Effectiveness of the proposed technology for informative feature selection was determined. This technology allows us to provide high quality of classification while significantly reducing the feature space. The number of features eliminated in the second step is approximately 50% for each data set under consideration, which greatly simplifies subsequent data analysis. After the third step, when the feature space is reduced to 15 features, the quality of classification by the macro-average F1-score metric is assessed as 98.9% for the GSE52581 dataset. For the GSE52581 dataset, with the feature space reduced to 266 features, the quality of classification by the macro-average F1-score metric is 91.3%. CONCLUSION. The results of the work demonstrate the promising outlook of the proposed technology for informative feature selection as applied to the data of immunosignature analysis. Privolzhsky Research Medical University 2020 2020-10-28 /pmc/articles/PMC8596259/ /pubmed/34796001 http://dx.doi.org/10.17691/stm2020.12.5.02 Text en https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Advanced Researches
Koshechkin, A.A.
Romanovich, O.V.
Stamate, D.
Johnston, S.A.
Zamyatin, A.V.
Technology of Informative Feature Selection for Immunosignature Analysis
title Technology of Informative Feature Selection for Immunosignature Analysis
title_full Technology of Informative Feature Selection for Immunosignature Analysis
title_fullStr Technology of Informative Feature Selection for Immunosignature Analysis
title_full_unstemmed Technology of Informative Feature Selection for Immunosignature Analysis
title_short Technology of Informative Feature Selection for Immunosignature Analysis
title_sort technology of informative feature selection for immunosignature analysis
topic Advanced Researches
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596259/
https://www.ncbi.nlm.nih.gov/pubmed/34796001
http://dx.doi.org/10.17691/stm2020.12.5.02
work_keys_str_mv AT koshechkinaa technologyofinformativefeatureselectionforimmunosignatureanalysis
AT romanovichov technologyofinformativefeatureselectionforimmunosignatureanalysis
AT stamated technologyofinformativefeatureselectionforimmunosignatureanalysis
AT johnstonsa technologyofinformativefeatureselectionforimmunosignatureanalysis
AT zamyatinav technologyofinformativefeatureselectionforimmunosignatureanalysis