Test of Independence - Jerry's Math Garden

$ \chi^2 = \sum_{i = 1}^{k}\frac{(o_i - e_i)^2}{e} $ To test the [[Probabilistic Independence|independence]] of two **qualitative** [[Variable|variables]] of classification, setup a [[Chi-Square Test|chi-square test for goodness of fit]] on the independence of the two variables, with the deviation between the theoretical, independent [[Contingency Table|contingency table]] and the empirical contingency table, $\chi^2$ as the test [[Statistic|statistic]]. | | $X_1$ | $X_2$ | $X_3$ | | --- | --- | --- | --- | | $Y_1$ | | | | | $Y_2$ | | | | If the two variables are independent (see [[Conditional Probability|conditional probability]]), then $P(X_a) = P(X_a|Y_b)$, $P(Y_a) = P(Y_a|X_b)$, meaning that the expected [[Frequency|frequency]] for $X_aY_b$ is $nP(X_aY_b) = nP(X_a)P(Y_b)$: $ e = nP(X_a)P(Y_b) = n\frac{n_a}{n}\frac{n_b}{n} = \frac{n_an_b}{n} = \mathrm{\frac{Column \times Row}{Total}} $ Establish the [[Null Hypothesis|null hypothesis]] that the two variables are independent. Note that since the [[Chi-Square Distribution|chi-square distribution]] is entirely to the right of the $x$ axis, this is a right-tailed test: $ \begin{align*} &H_0: \chi^2 = 0\ (\text{Independent}) \\ &H_1: \chi^2 > 0\ (\text{Dependent}) \end{align*} $ Since each frequency follows the [[Binomial Random Variable|binomial distribution]], for sufficiently large [[Count|sample sizes]], $o_i$ is [[Continuous Approximation of Discrete Distributions|approximately normally distributed]], justifying the use of the [[Chi-Square Distribution|chi-square distribution]] as $(o_i - e_i)^2$ becomes the square of a [[Normal Distribution|normal distribution]]. Use the [[Chi-Square Distribution|chi-square distribution]] with $(\text{Row} - 1)(\text{Column} - 1)$ degrees of freedom. Note that the chi-square test of independence only indicates the significance of the relationship and not its strength.