Cargando…

Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation

Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio R (max) of a (max) − a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Adikaram, K. K. L. B., Hussein, M. A., Effenberger, M., Becker, T.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646355/ https://www.ncbi.nlm.nih.gov/pubmed/26571035 http://dx.doi.org/10.1371/journal.pone.0141486

_version_	1782400928995868672
author	Adikaram, K. K. L. B. Hussein, M. A. Effenberger, M. Becker, T.
author_facet	Adikaram, K. K. L. B. Hussein, M. A. Effenberger, M. Becker, T.
author_sort	Adikaram, K. K. L. B.
collection	PubMed
description	Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio R (max) of a (max) − a (min) and S (n) − a (min) n and that of R (min) of a (max) − a (min) and a (max) n − S (n) are always equal to 2/n, where a (max) is the maximum element, a (min) is the minimum element and S (n) is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, R (max) > 2/n and R (min) > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k (1) ) and 2/n * (1 + k (2) ), respectively, where k (1) > k (2) and 0 ≤ k (1) ≤ n/2 − 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10(−4)%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.
format	Online Article Text
id	pubmed-4646355
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-46463552015-11-25 Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation Adikaram, K. K. L. B. Hussein, M. A. Effenberger, M. Becker, T. PLoS One Research Article Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio R (max) of a (max) − a (min) and S (n) − a (min) n and that of R (min) of a (max) − a (min) and a (max) n − S (n) are always equal to 2/n, where a (max) is the maximum element, a (min) is the minimum element and S (n) is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, R (max) > 2/n and R (min) > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k (1) ) and 2/n * (1 + k (2) ), respectively, where k (1) > k (2) and 0 ≤ k (1) ≤ n/2 − 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10(−4)%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process. Public Library of Science 2015-11-16 /pmc/articles/PMC4646355/ /pubmed/26571035 http://dx.doi.org/10.1371/journal.pone.0141486 Text en © 2015 Adikaram et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Adikaram, K. K. L. B. Hussein, M. A. Effenberger, M. Becker, T. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
title	Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
title_full	Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
title_fullStr	Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
title_full_unstemmed	Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
title_short	Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
title_sort	universal linear fit identification: a method independent of data, outliers and noise distribution model and free of missing or removed data imputation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4646355/ https://www.ncbi.nlm.nih.gov/pubmed/26571035 http://dx.doi.org/10.1371/journal.pone.0141486
work_keys_str_mv	AT adikaramkklb universallinearfitidentificationamethodindependentofdataoutliersandnoisedistributionmodelandfreeofmissingorremoveddataimputation AT husseinma universallinearfitidentificationamethodindependentofdataoutliersandnoisedistributionmodelandfreeofmissingorremoveddataimputation AT effenbergerm universallinearfitidentificationamethodindependentofdataoutliersandnoisedistributionmodelandfreeofmissingorremoveddataimputation AT beckert universallinearfitidentificationamethodindependentofdataoutliersandnoisedistributionmodelandfreeofmissingorremoveddataimputation

Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation

Ejemplares similares