Cargando…

Biostatistics Series Module 6: Correlation and Linear Regression

Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are...

Descripción completa

Detalles Bibliográficos
Autores principales: Hazra, Avijit, Gogtay, Nithya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122272/
https://www.ncbi.nlm.nih.gov/pubmed/27904175
http://dx.doi.org/10.4103/0019-5154.193662
_version_ 1782469545546481664
author Hazra, Avijit
Gogtay, Nithya
author_facet Hazra, Avijit
Gogtay, Nithya
author_sort Hazra, Avijit
collection PubMed
description Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r(2) denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
format Online
Article
Text
id pubmed-5122272
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-51222722016-11-30 Biostatistics Series Module 6: Correlation and Linear Regression Hazra, Avijit Gogtay, Nithya Indian J Dermatol IJD® Module on Biostatistics and Research Methodology for the Dermatologist - Module Editor: Saumya Panda Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r(2) denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. Medknow Publications & Media Pvt Ltd 2016 /pmc/articles/PMC5122272/ /pubmed/27904175 http://dx.doi.org/10.4103/0019-5154.193662 Text en Copyright: © Indian Journal of Dermatology http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.
spellingShingle IJD® Module on Biostatistics and Research Methodology for the Dermatologist - Module Editor: Saumya Panda
Hazra, Avijit
Gogtay, Nithya
Biostatistics Series Module 6: Correlation and Linear Regression
title Biostatistics Series Module 6: Correlation and Linear Regression
title_full Biostatistics Series Module 6: Correlation and Linear Regression
title_fullStr Biostatistics Series Module 6: Correlation and Linear Regression
title_full_unstemmed Biostatistics Series Module 6: Correlation and Linear Regression
title_short Biostatistics Series Module 6: Correlation and Linear Regression
title_sort biostatistics series module 6: correlation and linear regression
topic IJD® Module on Biostatistics and Research Methodology for the Dermatologist - Module Editor: Saumya Panda
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5122272/
https://www.ncbi.nlm.nih.gov/pubmed/27904175
http://dx.doi.org/10.4103/0019-5154.193662
work_keys_str_mv AT hazraavijit biostatisticsseriesmodule6correlationandlinearregression
AT gogtaynithya biostatisticsseriesmodule6correlationandlinearregression