Cargando…
An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting
The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earth...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9482488/ https://www.ncbi.nlm.nih.gov/pubmed/36124117 http://dx.doi.org/10.1155/2022/6555392 |
_version_ | 1784791466169597952 |
---|---|
author | Liu, Shuai Huang, Meng Li, Chenxi Lv, Wenchao Wang, Zhonghao |
author_facet | Liu, Shuai Huang, Meng Li, Chenxi Lv, Wenchao Wang, Zhonghao |
author_sort | Liu, Shuai |
collection | PubMed |
description | The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earthquake emergency Web document data and HTML structural features, combined with TF-IDF Algorithm and information calculation model, improves the word frequency factor and location factor parameters, and proposes the weighted frequency algorithm P-TF-IDF for earthquake emergency Web documents. To filter out less frequent words and optimize the FastText model, N-gram Feature word vectors effectively improve the efficiency of Web document data classification; for text classification data, use missing data recognition rules, data classification rules, and data repair rules to design an artificial intelligence-based earthquake emergency network information data cleaning framework to detect invalid data sets value, complete data comparison and redundancy judgment, clean up data conflicts and data errors, and generate a complete data set without duplication. The data cleaning framework not only completes the fusion of earthquake emergency network information but also provides a data foundation for the visualization of earthquake emergency data. |
format | Online Article Text |
id | pubmed-9482488 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-94824882022-09-18 An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting Liu, Shuai Huang, Meng Li, Chenxi Lv, Wenchao Wang, Zhonghao Comput Intell Neurosci Research Article The speed of earthquake emergency web document data cleaning is one of the key factors affecting emergency rescue decision-making. Data classification is the core process of data cleaning, and the efficiency of data classification determines the speed of data cleaning. This article is based on earthquake emergency Web document data and HTML structural features, combined with TF-IDF Algorithm and information calculation model, improves the word frequency factor and location factor parameters, and proposes the weighted frequency algorithm P-TF-IDF for earthquake emergency Web documents. To filter out less frequent words and optimize the FastText model, N-gram Feature word vectors effectively improve the efficiency of Web document data classification; for text classification data, use missing data recognition rules, data classification rules, and data repair rules to design an artificial intelligence-based earthquake emergency network information data cleaning framework to detect invalid data sets value, complete data comparison and redundancy judgment, clean up data conflicts and data errors, and generate a complete data set without duplication. The data cleaning framework not only completes the fusion of earthquake emergency network information but also provides a data foundation for the visualization of earthquake emergency data. Hindawi 2022-09-10 /pmc/articles/PMC9482488/ /pubmed/36124117 http://dx.doi.org/10.1155/2022/6555392 Text en Copyright © 2022 Shuai Liu et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Liu, Shuai Huang, Meng Li, Chenxi Lv, Wenchao Wang, Zhonghao An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting |
title | An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting |
title_full | An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting |
title_fullStr | An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting |
title_full_unstemmed | An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting |
title_short | An Earthquake Emergency Web Data Cleaning and Classification Method Based on Word Frequency and Position Weighting |
title_sort | earthquake emergency web data cleaning and classification method based on word frequency and position weighting |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9482488/ https://www.ncbi.nlm.nih.gov/pubmed/36124117 http://dx.doi.org/10.1155/2022/6555392 |
work_keys_str_mv | AT liushuai anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT huangmeng anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT lichenxi anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT lvwenchao anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT wangzhonghao anearthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT liushuai earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT huangmeng earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT lichenxi earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT lvwenchao earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting AT wangzhonghao earthquakeemergencywebdatacleaningandclassificationmethodbasedonwordfrequencyandpositionweighting |