Cargando…

Robustification of Linear Regression and Its Application in Genome-Wide Association Studies

Regression analysis is one of the most popular statistical techniques that attempt to explore the relationships between a response (dependent) variable and one or more explanatory (independent) variables. To test the overall significance of regression, F-statistic is used if the parameters are estim...

Descripción completa

Detalles Bibliográficos
Autores principales: Alamin, Md., Sultana, Most. Humaira, Xu, Haiming, Mollah, Md. Nurul Haque
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295010/
https://www.ncbi.nlm.nih.gov/pubmed/32582288
http://dx.doi.org/10.3389/fgene.2020.00549
_version_ 1783546583861690368
author Alamin, Md.
Sultana, Most. Humaira
Xu, Haiming
Mollah, Md. Nurul Haque
author_facet Alamin, Md.
Sultana, Most. Humaira
Xu, Haiming
Mollah, Md. Nurul Haque
author_sort Alamin, Md.
collection PubMed
description Regression analysis is one of the most popular statistical techniques that attempt to explore the relationships between a response (dependent) variable and one or more explanatory (independent) variables. To test the overall significance of regression, F-statistic is used if the parameters are estimated by the least-squares estimators (LSEs), while if the parameters are estimated by the maximum likelihood estimators (MLEs), the likelihood ratio test (LRT) statistic is used. However, both procedures produce misleading results and often fail to provide good fits to the reasonable space of the dataset in the presence of outlying observations. Moreover, outliers occur very frequently in any real datasets as well as in the molecular OMICS datasets. Hence, an effort is made in this study to robustify MLE based regression analysis by maximizing the β-likelihood function. The tuning parameter β is selected by cross-validation. For β = 0, the proposed method reduces to the classical MLE based regression analysis. We inspect the performance of the proposed method using both synthetic and real data analysis. The results of simulations indicate that the proposed method performs better than traditional methods in both outliers and high leverage points to estimate the parameters and mean square errors. The results of relative efficiency analysis show that the proposed estimator is relatively less affected than the popular estimators, including S, MM, and fast-S for normal error distribution in case high dimension and outliers. Also, real data analysis results demonstrated that the proposed method shows robust properties with respect to data contaminations, overcome the drawback of the traditional methods. Genome-wide association studies (GWAS) by the proposed method identify the vital gene influencing hypertension and iron level in the liver and spleen of mice. Furthermore, we have identified 15 and 21 significant SNPs for chalkiness degree and chalkiness percentage, respectively, by GWAS based on the proposed method. The variant of the SNPs might be provided the new resources for grain quality traits and could be used for further molecular and physiological analysis to enhance the better quality of rice grain. These results offer an important basis for further understanding of the robust regression analysis, which might be applied in various fields, including business, genetics, and bioinformatics.
format Online
Article
Text
id pubmed-7295010
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-72950102020-06-23 Robustification of Linear Regression and Its Application in Genome-Wide Association Studies Alamin, Md. Sultana, Most. Humaira Xu, Haiming Mollah, Md. Nurul Haque Front Genet Genetics Regression analysis is one of the most popular statistical techniques that attempt to explore the relationships between a response (dependent) variable and one or more explanatory (independent) variables. To test the overall significance of regression, F-statistic is used if the parameters are estimated by the least-squares estimators (LSEs), while if the parameters are estimated by the maximum likelihood estimators (MLEs), the likelihood ratio test (LRT) statistic is used. However, both procedures produce misleading results and often fail to provide good fits to the reasonable space of the dataset in the presence of outlying observations. Moreover, outliers occur very frequently in any real datasets as well as in the molecular OMICS datasets. Hence, an effort is made in this study to robustify MLE based regression analysis by maximizing the β-likelihood function. The tuning parameter β is selected by cross-validation. For β = 0, the proposed method reduces to the classical MLE based regression analysis. We inspect the performance of the proposed method using both synthetic and real data analysis. The results of simulations indicate that the proposed method performs better than traditional methods in both outliers and high leverage points to estimate the parameters and mean square errors. The results of relative efficiency analysis show that the proposed estimator is relatively less affected than the popular estimators, including S, MM, and fast-S for normal error distribution in case high dimension and outliers. Also, real data analysis results demonstrated that the proposed method shows robust properties with respect to data contaminations, overcome the drawback of the traditional methods. Genome-wide association studies (GWAS) by the proposed method identify the vital gene influencing hypertension and iron level in the liver and spleen of mice. Furthermore, we have identified 15 and 21 significant SNPs for chalkiness degree and chalkiness percentage, respectively, by GWAS based on the proposed method. The variant of the SNPs might be provided the new resources for grain quality traits and could be used for further molecular and physiological analysis to enhance the better quality of rice grain. These results offer an important basis for further understanding of the robust regression analysis, which might be applied in various fields, including business, genetics, and bioinformatics. Frontiers Media S.A. 2020-06-08 /pmc/articles/PMC7295010/ /pubmed/32582288 http://dx.doi.org/10.3389/fgene.2020.00549 Text en Copyright © 2020 Alamin, Sultana, Xu and Mollah. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Alamin, Md.
Sultana, Most. Humaira
Xu, Haiming
Mollah, Md. Nurul Haque
Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
title Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
title_full Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
title_fullStr Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
title_full_unstemmed Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
title_short Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
title_sort robustification of linear regression and its application in genome-wide association studies
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295010/
https://www.ncbi.nlm.nih.gov/pubmed/32582288
http://dx.doi.org/10.3389/fgene.2020.00549
work_keys_str_mv AT alaminmd robustificationoflinearregressionanditsapplicationingenomewideassociationstudies
AT sultanamosthumaira robustificationoflinearregressionanditsapplicationingenomewideassociationstudies
AT xuhaiming robustificationoflinearregressionanditsapplicationingenomewideassociationstudies
AT mollahmdnurulhaque robustificationoflinearregressionanditsapplicationingenomewideassociationstudies