Cargando…

An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting

The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earth...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Shuai, Huang, Meng, Li, Chenxi, Lv, Wenchao, Wang, Zhonghao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9482488/
https://www.ncbi.nlm.nih.gov/pubmed/36124117
http://dx.doi.org/10.1155/2022/6555392
_version_ 1784791466169597952
author Liu, Shuai
Huang, Meng
Li, Chenxi
Lv, Wenchao
Wang, Zhonghao
author_facet Liu, Shuai
Huang, Meng
Li, Chenxi
Lv, Wenchao
Wang, Zhonghao
author_sort Liu, Shuai
collection PubMed
description The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earthquake emergency Web document data and HTML structural features, combined with TF-IDF Algorithm and information calculation model, improves the word frequency factor and location factor parameters, and proposes the weighted frequency algorithm P-TF-IDF for earthquake emergency Web documents. To filter out less frequent words and optimize the FastText model, N-gram Feature word vectors effectively improve the efficiency of Web document data classification; for text classification data, use missing data recognition rules, data classification rules, and data repair rules to design an artificial intelligence-based earthquake emergency network information data cleaning framework to detect invalid data sets value, complete data comparison and redundancy judgment, clean up data conflicts and data errors, and generate a complete data set without duplication. The data cleaning framework not only completes the fusion of earthquake emergency network information but also provides a data foundation for the visualization of earthquake emergency data.
format Online
Article
Text
id pubmed-9482488
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-94824882022-09-18 An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting Liu, Shuai Huang, Meng Li, Chenxi Lv, Wenchao Wang, Zhonghao Comput Intell Neurosci Research Article The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earthquake emergency Web document data and HTML structural features, combined with TF-IDF Algorithm and information calculation model, improves the word frequency factor and location factor parameters, and proposes the weighted frequency algorithm P-TF-IDF for earthquake emergency Web documents. To filter out less frequent words and optimize the FastText model, N-gram Feature word vectors effectively improve the efficiency of Web document data classification; for text classification data, use missing data recognition rules, data classification rules, and data repair rules to design an artificial intelligence-based earthquake emergency network information data cleaning framework to detect invalid data sets value, complete data comparison and redundancy judgment, clean up data conflicts and data errors, and generate a complete data set without duplication. The data cleaning framework not only completes the fusion of earthquake emergency network information but also provides a data foundation for the visualization of earthquake emergency data. Hindawi 2022-09-10 /pmc/articles/PMC9482488/ /pubmed/36124117 http://dx.doi.org/10.1155/2022/6555392 Text en Copyright © 2022 Shuai Liu et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Shuai
Huang, Meng
Li, Chenxi
Lv, Wenchao
Wang, Zhonghao
An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
title An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
title_full An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
title_fullStr An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
title_full_unstemmed An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
title_short An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
title_sort earthquake emergency web data cleaning and classification method based on word frequency and position weighting
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9482488/
https://www.ncbi.nlm.nih.gov/pubmed/36124117
http://dx.doi.org/10.1155/2022/6555392
work_keys_str_mv AT liushuai anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT huangmeng anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT lichenxi anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT lvwenchao anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT wangzhonghao anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT liushuai earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT huangmeng earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT lichenxi earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT lvwenchao earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting
AT wangzhonghao earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting