Inbreeding Coefficient

Inbreeding coefficient

The inbreeding coefficient is usually referred to as F. As we explained in the IBD vs. IBS post, F is actually the probability of identity-by-decent (IBD) of two alleles. If the two alleles are in the same diploid individual, then F is the inbreeding coefficient of the individual at this locus.

Inbreeding coefficient at locus level

It is defined as 1 - O(f(Aa))/E(f(Aa)). The expectation is based on Hardy Weinburg Equilibrium. It can be generalized to multi-allelic and polyploidy. See more in this post.

Inbreeding coefficient at individual level

But as you can see in the output of plink --het, each sample gets a F. It used O(HOM), the number of observed homozygous loci (3rd column), and E(HOM), the number of expected homozygous loci (4th column), and the total number of loci in the 5th column, N(NM), to calculate F, the 5th column using equation (O(HOM) - E(HOM)) / (N(NM) - E(HOM)). The higher the F, the more inbreed, or more homozygous than expected. O(HOM) and N(NM) is easy to count. For a locus with MAF of p, its E(HOM) would be 1-2p(1-p) based on HWE, then we just sum the E(HOM) up for all the loci where this individual has a genotype, or no missing data. If there is no missing data, the E(HOM) should be the same for all the individuals.

GRM

I was using gcta --make-grm and the values on the diagnols are supposed to be 1+F. However I find huge discrepency between this value and the F reported in plink --het and I was wondering why. I tried to follow the Methods on a toy dataset as described in Yang 2011 NG, and I understand they are doing very different things, but the general trend should be the same given the same dataset.

Huan Fan /
Published under (CC) BY-NC-SA in categories notes  tagged with ML 
comments powered by Disqus