Variance is the most basic one. It measures the spread or dispersion of a single variable’s values around its mean. Var(X)=1/n∑(xi−mean)^2. Standard deviation is sqrt(Var(X)).
There is concept called the coefficient of variation. It is calculated as the ratio of the standard deviation to the mean. and often expressed as a percentage: CV=SD/Mean×100%
. Therefore it can be considered as a standardized form of dispersion.
Covariance measures the degree to which two variables change together. It indicates whether two variables tend to increase or decrease in tandem. Cov(X,Y)=1/n∑(xi−xmean)(yi−mean). You can see that if Xi and Yi are both greater or smaller than their means, the product will be positive. If the trend is different, it would be negative and the Cov(X,Y) will be smaller.
Correlation is standardized covariance. folumar of correlation is cor(y1, y2) = cov(y1, y2)/sqrt(var(y1)var(y2)). If the variables (y1, and y2) are already normalized (mean = 0, sd=1), then cor(y1, y2) = cov(y1, y2). Note that in simple linear regression: R2=cor(y, y_hat)^2. If you have a vcv (variance covariance) matrix, you can turn it into a correlation matrix via stats::cov2cor
.
Both variation and __covariation are broader terms comparing to variance and covariance, which are precise statistic terms with defined calculation equations.
Another set of somewhat related concepts are Variables, Covariables and Covariates.
Variable is a general term used to describe any characteristic, measurement, or attribute that can take on different values such as age, height, weight, blood pressure, genotype, etc. It is used in almost every context in statistics and data analysis.
Covariate is a variable that is possibly predictive of the outcome and is included in a model to adjust or control for its effect. In mixed models, it is always used as one of the fixed effect term. It is the one that you want to control for but not actually care about (sounds like random effect, right?!). For example, in a study examining the effect of a new drug on blood pressure, age and baseline blood pressure might be treated as covariates. Nota that this term sometimes can insinuate that this variable is categorical (or am I hallucinating?). For example in gcta
, if your covariate is categorical, you use --covar
. However if it is continuous, it is --qcovar
, q
for quantitative.
Covariable is synonymous with Covariate but less common.