Cargando…
Robustification of Linear Regression and Its Application in Genome-Wide Association Studies
Regression analysis is one of the most popular statistical techniques that attempt to explore the relationships between a response (dependent) variable and one or more explanatory (independent) variables. To test the overall significance of regression, F-statistic is used if the parameters are estim...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295010/ https://www.ncbi.nlm.nih.gov/pubmed/32582288 http://dx.doi.org/10.3389/fgene.2020.00549 |
_version_ | 1783546583861690368 |
---|---|
author | Alamin, Md. Sultana, Most. Humaira Xu, Haiming Mollah, Md. Nurul Haque |
author_facet | Alamin, Md. Sultana, Most. Humaira Xu, Haiming Mollah, Md. Nurul Haque |
author_sort | Alamin, Md. |
collection | PubMed |
description | Regression analysis is one of the most popular statistical techniques that attempt to explore the relationships between a response (dependent) variable and one or more explanatory (independent) variables. To test the overall significance of regression, F-statistic is used if the parameters are estimated by the least-squares estimators (LSEs), while if the parameters are estimated by the maximum likelihood estimators (MLEs), the likelihood ratio test (LRT) statistic is used. However, both procedures produce misleading results and often fail to provide good fits to the reasonable space of the dataset in the presence of outlying observations. Moreover, outliers occur very frequently in any real datasets as well as in the molecular OMICS datasets. Hence, an effort is made in this study to robustify MLE based regression analysis by maximizing the β-likelihood function. The tuning parameter β is selected by cross-validation. For β = 0, the proposed method reduces to the classical MLE based regression analysis. We inspect the performance of the proposed method using both synthetic and real data analysis. The results of simulations indicate that the proposed method performs better than traditional methods in both outliers and high leverage points to estimate the parameters and mean square errors. The results of relative efficiency analysis show that the proposed estimator is relatively less affected than the popular estimators, including S, MM, and fast-S for normal error distribution in case high dimension and outliers. Also, real data analysis results demonstrated that the proposed method shows robust properties with respect to data contaminations, overcome the drawback of the traditional methods. Genome-wide association studies (GWAS) by the proposed method identify the vital gene influencing hypertension and iron level in the liver and spleen of mice. Furthermore, we have identified 15 and 21 significant SNPs for chalkiness degree and chalkiness percentage, respectively, by GWAS based on the proposed method. The variant of the SNPs might be provided the new resources for grain quality traits and could be used for further molecular and physiological analysis to enhance the better quality of rice grain. These results offer an important basis for further understanding of the robust regression analysis, which might be applied in various fields, including business, genetics, and bioinformatics. |
format | Online Article Text |
id | pubmed-7295010 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-72950102020-06-23 Robustification of Linear Regression and Its Application in Genome-Wide Association Studies Alamin, Md. Sultana, Most. Humaira Xu, Haiming Mollah, Md. Nurul Haque Front Genet Genetics Regression analysis is one of the most popular statistical techniques that attempt to explore the relationships between a response (dependent) variable and one or more explanatory (independent) variables. To test the overall significance of regression, F-statistic is used if the parameters are estimated by the least-squares estimators (LSEs), while if the parameters are estimated by the maximum likelihood estimators (MLEs), the likelihood ratio test (LRT) statistic is used. However, both procedures produce misleading results and often fail to provide good fits to the reasonable space of the dataset in the presence of outlying observations. Moreover, outliers occur very frequently in any real datasets as well as in the molecular OMICS datasets. Hence, an effort is made in this study to robustify MLE based regression analysis by maximizing the β-likelihood function. The tuning parameter β is selected by cross-validation. For β = 0, the proposed method reduces to the classical MLE based regression analysis. We inspect the performance of the proposed method using both synthetic and real data analysis. The results of simulations indicate that the proposed method performs better than traditional methods in both outliers and high leverage points to estimate the parameters and mean square errors. The results of relative efficiency analysis show that the proposed estimator is relatively less affected than the popular estimators, including S, MM, and fast-S for normal error distribution in case high dimension and outliers. Also, real data analysis results demonstrated that the proposed method shows robust properties with respect to data contaminations, overcome the drawback of the traditional methods. Genome-wide association studies (GWAS) by the proposed method identify the vital gene influencing hypertension and iron level in the liver and spleen of mice. Furthermore, we have identified 15 and 21 significant SNPs for chalkiness degree and chalkiness percentage, respectively, by GWAS based on the proposed method. The variant of the SNPs might be provided the new resources for grain quality traits and could be used for further molecular and physiological analysis to enhance the better quality of rice grain. These results offer an important basis for further understanding of the robust regression analysis, which might be applied in various fields, including business, genetics, and bioinformatics. Frontiers Media S.A. 2020-06-08 /pmc/articles/PMC7295010/ /pubmed/32582288 http://dx.doi.org/10.3389/fgene.2020.00549 Text en Copyright © 2020 Alamin, Sultana, Xu and Mollah. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Alamin, Md. Sultana, Most. Humaira Xu, Haiming Mollah, Md. Nurul Haque Robustification of Linear Regression and Its Application in Genome-Wide Association Studies |
title | Robustification of Linear Regression and Its Application in Genome-Wide Association Studies |
title_full | Robustification of Linear Regression and Its Application in Genome-Wide Association Studies |
title_fullStr | Robustification of Linear Regression and Its Application in Genome-Wide Association Studies |
title_full_unstemmed | Robustification of Linear Regression and Its Application in Genome-Wide Association Studies |
title_short | Robustification of Linear Regression and Its Application in Genome-Wide Association Studies |
title_sort | robustification of linear regression and its application in genome-wide association studies |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295010/ https://www.ncbi.nlm.nih.gov/pubmed/32582288 http://dx.doi.org/10.3389/fgene.2020.00549 |
work_keys_str_mv | AT alaminmd robustificationoflinearregressionanditsapplicationingenomewideassociationstudies AT sultanamosthumaira robustificationoflinearregressionanditsapplicationingenomewideassociationstudies AT xuhaiming robustificationoflinearregressionanditsapplicationingenomewideassociationstudies AT mollahmdnurulhaque robustificationoflinearregressionanditsapplicationingenomewideassociationstudies |