Cargando…

Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology

In big data analysis with the rapid improvement of computer storage capacity and the rapid development of complex algorithms, the exponential growth of massive data has also made science and technology progress with each passing day. Based on omics data such as mRNA data, microRNA data, or DNA methy...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiao, Chaohui, Wang, Fuchuan, Jia, Tianye, Pan, Liru, Wang, Zhaohai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9398858/ https://www.ncbi.nlm.nih.gov/pubmed/36017150 http://dx.doi.org/10.1155/2022/4004130

_version_	1784772409041092608
author	Xiao, Chaohui Wang, Fuchuan Jia, Tianye Pan, Liru Wang, Zhaohai
author_facet	Xiao, Chaohui Wang, Fuchuan Jia, Tianye Pan, Liru Wang, Zhaohai
author_sort	Xiao, Chaohui
collection	PubMed
description	In big data analysis with the rapid improvement of computer storage capacity and the rapid development of complex algorithms, the exponential growth of massive data has also made science and technology progress with each passing day. Based on omics data such as mRNA data, microRNA data, or DNA methylation data, this study uses traditional clustering methods such as kmeans, K-nearest neighbors, hierarchical clustering, affinity propagation, and nonnegative matrix decomposition to classify samples into categories, obtained: (1) The assumption that the attributes are independent of each other reduces the classification effect of the algorithm to a certain extent. According to the idea of multilevel grid, there is a one-to-one mapping from high-dimensional space to one-dimensional. The complexity is greatly simplified by encoding the one-dimensional grid of the hierarchical grid. The logic of the algorithm is relatively simple, and it also has a very stable classification efficiency. (2) Convert the two-dimensional representation of the data into the one-dimensional representation of the binary, realize the dimensionality reduction processing of the data, and improve the organization and storage efficiency of the data. The grid coding expresses the spatial position of the data, maintains the original organization method of the data, and does not make the abstract expression of the data object. (3) The data processing of nondiscrete and missing values provides a new opportunity for the identification of protein targets of small molecule therapy and obtains a better classification effect. (4) The comparison of the three models shows that Naive Bayes is the optimal model. Each iteration is composed of alternately expected steps and maximal steps and then identified and quantified by MS.
format	Online Article Text
id	pubmed-9398858
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-93988582022-08-24 Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology Xiao, Chaohui Wang, Fuchuan Jia, Tianye Pan, Liru Wang, Zhaohai Comput Math Methods Med Research Article In big data analysis with the rapid improvement of computer storage capacity and the rapid development of complex algorithms, the exponential growth of massive data has also made science and technology progress with each passing day. Based on omics data such as mRNA data, microRNA data, or DNA methylation data, this study uses traditional clustering methods such as kmeans, K-nearest neighbors, hierarchical clustering, affinity propagation, and nonnegative matrix decomposition to classify samples into categories, obtained: (1) The assumption that the attributes are independent of each other reduces the classification effect of the algorithm to a certain extent. According to the idea of multilevel grid, there is a one-to-one mapping from high-dimensional space to one-dimensional. The complexity is greatly simplified by encoding the one-dimensional grid of the hierarchical grid. The logic of the algorithm is relatively simple, and it also has a very stable classification efficiency. (2) Convert the two-dimensional representation of the data into the one-dimensional representation of the binary, realize the dimensionality reduction processing of the data, and improve the organization and storage efficiency of the data. The grid coding expresses the spatial position of the data, maintains the original organization method of the data, and does not make the abstract expression of the data object. (3) The data processing of nondiscrete and missing values provides a new opportunity for the identification of protein targets of small molecule therapy and obtains a better classification effect. (4) The comparison of the three models shows that Naive Bayes is the optimal model. Each iteration is composed of alternately expected steps and maximal steps and then identified and quantified by MS. Hindawi 2022-08-16 /pmc/articles/PMC9398858/ /pubmed/36017150 http://dx.doi.org/10.1155/2022/4004130 Text en Copyright © 2022 Chaohui Xiao et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Xiao, Chaohui Wang, Fuchuan Jia, Tianye Pan, Liru Wang, Zhaohai Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology
title	Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology
title_full	Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology
title_fullStr	Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology
title_full_unstemmed	Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology
title_short	Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology
title_sort	big data analysis and application of liver cancer gene sequence based on second-generation sequencing technology
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9398858/ https://www.ncbi.nlm.nih.gov/pubmed/36017150 http://dx.doi.org/10.1155/2022/4004130
work_keys_str_mv	AT xiaochaohui bigdataanalysisandapplicationoflivercancergenesequencebasedonsecondgenerationsequencingtechnology AT wangfuchuan bigdataanalysisandapplicationoflivercancergenesequencebasedonsecondgenerationsequencingtechnology AT jiatianye bigdataanalysisandapplicationoflivercancergenesequencebasedonsecondgenerationsequencingtechnology AT panliru bigdataanalysisandapplicationoflivercancergenesequencebasedonsecondgenerationsequencingtechnology AT wangzhaohai bigdataanalysisandapplicationoflivercancergenesequencebasedonsecondgenerationsequencingtechnology

Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology

Ejemplares similares