Checking for Normality Using Chi-square Tests
Chi-square(X 2) tests can be used determine whether a set of data can be adequately modeled by a specified distribution. The chi-square test divides the data into nonoverlapping intervals calls boundaries. It compares the number of observations in each boundary to the number expected in the distribution being tested, in this case the normal distribution. Sometimes this test is called “the goodness of fit test.”
The boundaries are chosen for convenience, with five being a commonly used number. The boundary limits are used to generate a probability for the expected frequency. This is done in the case of the normal distribution by calculating the 2 value based on the boundary limit and the average and standard distribution of the data set, in the following manner:
1. List the data set in ascending order.
2. Determine the number of boundaries (variable k) to be used in this test.
3. Let mi be the number of sample values observed in each boundary
4. Calculate a z value for each boundary. For the two outermost boundaries, there is one single z value. For inside boundaries, there are two z values.
5. Calculate the expected frequency for each boundary by determining the Pi = f(z) and multiplying that number by the total number in the data set.
6. Determine the contribution of each boundary to total chi-square value through the formula
∑(mi - nPI)2
X2= nPi with k – 1 DOF
A hypothesis rejet, which indicates that distribution is not normal is when X2≥X2a, which obtained from a X2 table forα=1 – confidence; k is the number of boundaries, and DOF is the degrees of freedom. Select value of the X2 table are given in Table5.3.