Cargando…

A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble

A synthesis of statistical inference and machine learning (ML) tools has been employed to establish a comprehensive insight of a coarse data. Water components’ data for 16 central distributing locations of Lahore, the capital of second most populated province of Pakistan, has been analyzed to gauge...

Descripción completa

Detalles Bibliográficos
Autor principal: Shahid, Nazish
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060416/
https://www.ncbi.nlm.nih.gov/pubmed/36991152
http://dx.doi.org/10.1038/s41598-023-32280-6
_version_ 1785017092311875584
author Shahid, Nazish
author_facet Shahid, Nazish
author_sort Shahid, Nazish
collection PubMed
description A synthesis of statistical inference and machine learning (ML) tools has been employed to establish a comprehensive insight of a coarse data. Water components’ data for 16 central distributing locations of Lahore, the capital of second most populated province of Pakistan, has been analyzed to gauge current water stature of the city. Moreover, a classification of surplus-response variables through tolerance manipulation was incorporated to debrief dimension aspect of the data. By the same token, the influence of supererogatory variables’ renouncement through identification of clustering movement of constituents is inquired. The approach of building a spectrum of colluding results through application of comparable methods has been experimented. To test the propriety of each statistical method prior to its execution on a huge data, a faction of ML schemes have been proposed. The supervised learning tools pca, factoran and clusterdata were implemented to establish an elemental character of water at elected locations. A location ‘LAH-13’ was highlighted for containing an out of normal range Total Dissolved Solids (TDS) concentration in the water. The classification of lower and higher variability parameters carried out by Sample Mean (XBAR) control identified a set of least correlated variables pH, As, Total Coliforms and E. Coli. The analysis provided four locations LAH-06, LAH-10, LAH-13 and LAH-14 for extreme concentration propensity. An execution of factoran demonstrated that specific tolerance of independent variability ‘0.005’ could be employed to reduce dimension of a system without loss of fundamental data information. A higher value of cophenetic coefficient, c = 0.9582 provided the validation for an accurate cluster division of similar characteristics’ variables. The current approach of mutually validating ML and SA (statistical analysis) schemes will assist in preparing the groundwork for state of the art analysis (SOTA) analysis. The advantage of our approach can be examined through the fact that the related SOTA will further refine the predictive precision between two comparable methods, unlike the SOTA analysis between two random ML methods. Conclusively, this study featured the locations LAH-03, LAH-06, LAH-12, LAH-13, LAH-14 and LAH-15 with compromised water quality in the region.
format Online
Article
Text
id pubmed-10060416
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-100604162023-03-31 A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble Shahid, Nazish Sci Rep Article A synthesis of statistical inference and machine learning (ML) tools has been employed to establish a comprehensive insight of a coarse data. Water components’ data for 16 central distributing locations of Lahore, the capital of second most populated province of Pakistan, has been analyzed to gauge current water stature of the city. Moreover, a classification of surplus-response variables through tolerance manipulation was incorporated to debrief dimension aspect of the data. By the same token, the influence of supererogatory variables’ renouncement through identification of clustering movement of constituents is inquired. The approach of building a spectrum of colluding results through application of comparable methods has been experimented. To test the propriety of each statistical method prior to its execution on a huge data, a faction of ML schemes have been proposed. The supervised learning tools pca, factoran and clusterdata were implemented to establish an elemental character of water at elected locations. A location ‘LAH-13’ was highlighted for containing an out of normal range Total Dissolved Solids (TDS) concentration in the water. The classification of lower and higher variability parameters carried out by Sample Mean (XBAR) control identified a set of least correlated variables pH, As, Total Coliforms and E. Coli. The analysis provided four locations LAH-06, LAH-10, LAH-13 and LAH-14 for extreme concentration propensity. An execution of factoran demonstrated that specific tolerance of independent variability ‘0.005’ could be employed to reduce dimension of a system without loss of fundamental data information. A higher value of cophenetic coefficient, c = 0.9582 provided the validation for an accurate cluster division of similar characteristics’ variables. The current approach of mutually validating ML and SA (statistical analysis) schemes will assist in preparing the groundwork for state of the art analysis (SOTA) analysis. The advantage of our approach can be examined through the fact that the related SOTA will further refine the predictive precision between two comparable methods, unlike the SOTA analysis between two random ML methods. Conclusively, this study featured the locations LAH-03, LAH-06, LAH-12, LAH-13, LAH-14 and LAH-15 with compromised water quality in the region. Nature Publishing Group UK 2023-03-29 /pmc/articles/PMC10060416/ /pubmed/36991152 http://dx.doi.org/10.1038/s41598-023-32280-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Shahid, Nazish
A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble
title A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble
title_full A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble
title_fullStr A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble
title_full_unstemmed A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble
title_short A proficiency assessment of integrating machine learning (ML) schemes on Lahore water ensemble
title_sort proficiency assessment of integrating machine learning (ml) schemes on lahore water ensemble
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060416/
https://www.ncbi.nlm.nih.gov/pubmed/36991152
http://dx.doi.org/10.1038/s41598-023-32280-6
work_keys_str_mv AT shahidnazish aproficiencyassessmentofintegratingmachinelearningmlschemesonlahorewaterensemble
AT shahidnazish proficiencyassessmentofintegratingmachinelearningmlschemesonlahorewaterensemble