Cargando…

Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression resear...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dubey, Aditya, Rasool, Akhtar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8692342/ https://www.ncbi.nlm.nih.gov/pubmed/34934107 http://dx.doi.org/10.1038/s41598-021-03438-x

_version_	1784618938084098048
author	Dubey, Aditya Rasool, Akhtar
author_facet	Dubey, Aditya Rasool, Akhtar
author_sort	Dubey, Aditya
collection	PubMed
description	For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics.
format	Online Article Text
id	pubmed-8692342
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-86923422021-12-22 Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour Dubey, Aditya Rasool, Akhtar Sci Rep Article For most bioinformatics statistical methods, particularly for gene expression data classification, prognosis, and prediction, a complete dataset is required. The gene sample value can be missing due to hardware failure, software failure, or manual mistakes. The missing data in gene expression research dramatically affects the analysis of the collected data. Consequently, this has become a critical problem that requires an efficient imputation algorithm to resolve the issue. This paper proposed a technique considering the local similarity structure that predicts the missing data using clustering and top K nearest neighbor approaches for imputing the missing value. A similarity-based spectral clustering approach is used that is combined with the K-means. The spectral clustering parameters, cluster size, and weighting factors are optimized, and after that, missing values are predicted. For imputing each cluster’s missing value, the top K nearest neighbor approach utilizes the concept of weighted distance. The evaluation is carried out on numerous datasets from a variety of biological areas, with experimentally inserted missing values varying from 5 to 25%. Experimental results prove that the proposed imputation technique makes accurate predictions as compared to other imputation procedures. In this paper, for performing the imputation experiments, microarray gene expression datasets consisting of information of different cancers and tumors are considered. The main contribution of this research states that local similarity-based techniques can be used for imputation even when the dataset has varying dimensionality and characteristics. Nature Publishing Group UK 2021-12-21 /pmc/articles/PMC8692342/ /pubmed/34934107 http://dx.doi.org/10.1038/s41598-021-03438-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Dubey, Aditya Rasool, Akhtar Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
title	Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
title_full	Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
title_fullStr	Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
title_full_unstemmed	Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
title_short	Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
title_sort	efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8692342/ https://www.ncbi.nlm.nih.gov/pubmed/34934107 http://dx.doi.org/10.1038/s41598-021-03438-x
work_keys_str_mv	AT dubeyaditya efficienttechniqueofmicroarraymissingdataimputationusingclusteringandweightednearestneighbour AT rasoolakhtar efficienttechniqueofmicroarraymissingdataimputationusingclusteringandweightednearestneighbour

Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour

Ejemplares similares