Chapter 10. Analysis of Variance
Chapter Objectives
In this chapter, readers will learn to do the following:
- Specify the chi-square distribution and apply it
- Specify the F-distribution and apply it
- Conduct the ANOVA using F-tests
In previous chapters, readers learned about several distributions, including the normal distribution, Student’s distribution (t-distribution), and binomial distribution of variables. We specified criteria for each of these distributions. It was also shown that the normal distribution can be used to analyze the binomial variables under certain conditions. In this chapter, we will introduce two more distributions: chi-square and F-distributions. We will also explain how the F-distribution can be used for the analysis of variance (ANOVA).
10.1 Chi-Square Distribution
In Chapter 4, we provided the definition of binomial variables and analyzed some examples. One of them was a famous example about fair coins, with the head and tail outcomes. Based on common sense, we found out that the probability of getting a head equals 0.5: . If we repeat this experiment with 300 fair coins, on average, we will observe the head for 150 coins. Using the statistics terminology, we could write Now, let’s remember another example that was used earlier in this book: that of tossing a die. Since the die has six faces, we expect six outcomes (faces) tossing it. In other words, the probability of observing each of the faces, enumerated from 1 to 6, equals 1/6: . For instance, if we toss 300 ideal dice, on average, the face 5 will be observed for dice. Unlike the binomial variables, the number of possible outcomes in this example exceeds 2. Note that all 300 experiments were independent. In statistics, by convention, we call these types of variables multinomial.
In both examples, all simple events are equally likely to occur; the probability of observing the head was equal to the probability of observing the tail, and the probability of observing any of the faces from 1 to 6 was the same. In general, the probabilities of simple events may not be equal. Let’s analyze one more example about the binomial and multinomial variables.
Example 10.1
Some universities offer courses operating on various grading systems. One of those grading systems is known as binary. For these courses, students receive either a P (passing) or an F (failing). Another popular grading system is called alpha grade. According to this system, students’ overall course performance can be evaluated using 12 categories: A+, A, A-, B+, B, B-, C+, C, C-, D+, D, and F. Based on years of statistics, universities recommend proportions of students for each of these categories. The academic records of one Canadian university (which we will refer to as University X) indicate the following proportions for binary and alpha grade systems in undergraduate courses.
Table 10.1

It is expected that the actual proportions will differ from the historical numbers provided in Table 10.1. Assume that a professor at University X evaluated the performance of her students using the alpha grade system and received slightly different proportions.

Let’s evaluate the dispersion between the observed results and the historical data. It would be reasonable (as we did before) to square differences between the observations and standardize them by dividing by the historical proportion, and add these terms:

In statistics, the quantity, calculated using the formula
\[\sum^n_{i=1}{\frac{{\left(x_i-{\overline{x}}_i\right)}^2}{{\overline{x}}_i}}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.1)\]
describes the standardized dispersion between the expected and observed data for the selected sample with a size of n. In our example, this quantity equals 0.09518.
In previous chapters, we used variance [latex]s^2=\frac{\sum^n_{i=1}{{\left(x_i-\overline{x}\right)}^2}}{n-1}[/latex] to estimate this dispersion for the sample of size n. It is understandable that this variance itself will change from sample to sample. Analysis of variance, often called ANOVA, is a method that is largely used in statistics to analyze data. In this chapter, we will briefly explain the basics of ANOVA and provide some simple examples.
It would be reasonable to assume that the distribution of the sample variance begins with s2 = 0, since the variance cannot be negative. This can also be shown theoretically. However, this proof requires more mathematical knowledge and is beyond the objectives of this book. In previous chapters, we standardized samples to use the universal tables of distributions: z-score for normal distribution (table A3, Appendix) and t-score for Student’s distribution (table A4, Appendix). The standardized statistic for the distribution of the sample variance is called the chi-square variable and is defined using the following formula, considering that the population is normal:
\[{\chi }^2=\frac{\left(n-1\right)s^2}{{\sigma }^2}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.2)\]
where σ2 is the population variance.

Figure 10.1. Chi-square distribution curves for various values of the degrees of freedom
The distribution of this variable is called the chi-square probability distribution with the degrees of freedom df = n – 1. The exact expression of the probability density function of chi-square contains so-called gamma function, which is too complicated to explain in this book. Instead, we will refer to the graph of the chi-square distribution and to a table of cumulative probabilities with pre-calculated values (table A5, Appendix). Figure 10.1 presents the chi-square distribution for various values of the degrees of freedom, or df. We must consider that, by the nature of the probability distribution, the total area below each of the curves equals 1, regardless of the value of df. Careful readers might notice that the shapes and curves of t-distribution also vary with df. The probability that the chi-square exceeds a certain value, , equals c, which corresponds to the area on the right of [latex]{\chi }^2_C[/latex] (fig. 10.2) for a given sample size.

Figure 10.2. The probability that the chi-square exceeds a certain value,[latex]{\chi }^2_C[/latex], equals c
Table 10.2

Table 10.2 provides values of chi-square, which leaves a given area (cumulative probability) on the right for the given sample size (or degrees of freedom). Consider that the table shown here is just an extraction of the actual table, which is provided in the appendix of this book (table A5, Appendix).
Example 10. 2
(a) For [latex]df=12[/latex], determine the chi-square value, which leaves the area equal to 0.95 on the right.
(b) For [latex]df=9[/latex], determine the chi-value, which leaves the area 0.10 on the left.
(c) What is the probability that, based on the 15 measurements from the normally distributed population, chi-square value exceeds 8?
Solution:
To solve this question, we will refer to the chi-square value distribution table in the appendix (table A5, Appendix).
(a) To determine the required chi-square value, we find the number at the intersection of row “12” and column "0.95,” which is 5.226 (fig. 10.3).

Figure 10.3. A fragment of the chi-square distribution table for example 10.2(a)
Therefore, [latex]{\chi }^2_{0.95}=5.226[/latex].
(b) Since the chi-square value leaves 0.10 on the left, the area to its right equals [latex]1-0.10=0.90[/latex]. Using the chi-square value distribution table (table A5, Appendix), we can find that the corresponding chi-square value is [latex]{\chi }^2=4.168[/latex].
(c) It is given that the sample size is [latex]n=15[/latex]. Hence, [latex]df=15-1=14[/latex]. The row “14” of the chi-square value distribution table (table A5, Appendix) does not contain exactly 8. The closest number is 7.790. This number belongs to column 0.90 (fig. 10.4).

Figure 10.4. A fragment of the chi-square distribution table for example 10.2(b)
Therefore, [latex]P\left(χ^2\ge 8\right)\approx 0.90[/latex].
After learning to work with the chi-square distribution table (table A5, Appendix), we can solve application problems. In previous chapters, we used hypothesis testing to verify various statements. The same tool could be applied to make an inference for the population variance using the chi-square distribution, as you can see in the next example.
Example 10.3
According to the information on the Insurdinary company’s website, the household income in Canada for 2023 ranges between $60,000 and $120,000 (https://www.insurdinary.ca/average-household-income-in-canada, obtained on December 24, 2023). A graduate student at the University of Regina found this statement surprising because it was made on April 6, 2023 (in the fourth month of the year—that is, eight months before the year ended!), when the statement was posted. She administered an online survey across Canada to test this information and recorded the data collected from 35 households for 2023. According to her calculations, the sample mean was with standard deviation . Use the information provided by the student and test Insurdinary’s statement.
Solution:
Assume that the household income distribution is normal. Since, according to Insurdinary, the range of Canadian household income is [latex]R=$120,000-$60,000=$60,000[/latex], the standard deviation of the population can be estimated as [latex]σ\approx \frac{R}{4}=\frac{60000}{4}=15000[/latex]. Hence, the approximate value of the population variance equals 225000000, whereas the sample variance is [latex]s^2={12000}^2=144000000[/latex]. Now we can conduct the hypothesis testing. It would be reasonable to construct the null and alternative hypotheses as follows:
\[H_0:σ^2=225000000\]
\[H_a:σ^2>225000000\]
The chi-square value, evaluated for the sample, is
\[χ^2=\frac{\left(n-1\right)s^2}{σ^2}=\frac{\left(35-1\right)·144000000}{225000000}=21.76\]
Let's use [latex]α=0.05[/latex] and the critical value method to test the null hypothesis of this one-tail problem. According to the chi-square distribution table (table A5, Appendix), the critical value for [latex]α=0.05[/latex] and [latex]df=34[/latex] is 48.602.

Figure 10.5. Chi-square distribution curve for example 10.3
Since the test statistic is not lying in the rejection region, we do not have enough evidence to reject the null hypothesis. In other words, the statement posted on the Insurdinary website is reasonable according to results of the conducted survey. Note that if the sample variance was much larger, the corresponding chi-square value would lie in the rejection region, which would mean that the household income range is larger than stated by Insurdinary.
Summarizing this subsection, we can conclude that, unlike the z-distribution and t-distribution, the chi-square distribution is the asymmetrical distribution of non-negative variables (lifetime distribution). In example 10.3, we applied the chi-distribution with n – 1 degrees of freedom to test the variation of one normally distributed population using a sample with a sample size of . In the following subsection, we will try to test the equality of variances of two normally distributed populations from two independent random samples with given degrees of freedom.
10.2 F-Distribution
In previous chapters, you learned how to compare two normally distributed populations using the means of two randomly selected samples (one sample from each population). Sometimes, it is important to compare the variability of two populations, such as
- Age ranges of students registered in two different programs
- Error ranges of two measurement methods
- Income ranges in two different provinces
Consider two normally distributed populations with variances [latex]σ^2_1[/latex] and [latex]σ^2_2[/latex]. We can compare these variances in many ways. For instance, we could analyze the difference [latex]σ^2_1-σ^2_2[/latex] or the ratio [latex]\frac{σ^2_1}{σ^2_2}[/latex]. Each of these approaches have their “pros” and “cons.” In both cases, we select one sample from each population and compare the sample variances [latex]s^2_1[/latex] and [latex]s^2_2[/latex]. In this chapter, we will explain how the ratio of sample variances can be used to compare two populations. Obviously, we can select various samples and the ratio of sample variances may vary. Hence, each ratio can be selected with some probability. In statistics, the distribution of the probability of ratio of sample variances is known as F-distribution. One can start the analysis of this probability distribution with two straightforward remarks:
- If the ratio [latex]\frac{s^2_1}{s^2_2}[/latex] approximately equals 1, then within a given confidence, the variances of populations are equal.
- If the ratio [latex]\frac{s^2_1}{s^2_2}[/latex] is too large or very close to 0, then within a given confidence, the variances of populations differ.

Figure 10.6. F-distribution curves for pairs of samples with different degrees of freedoms
Readers might notice that these curves are very similar to curves obtained for the chi-square distribution.
F-distribution curves, too, are asymmetrical, and the F-values are non-negative since they are determined as the ratio of two squares. The probability that the ratio of variances of two samples randomly selected from two different populations exceeds a certain value, Fc , equals c, which corresponds to the area on the right of (fig. 10.7).

Figure 10.7. The probability that the ratio of variances of two samples randomly selected from two different populations exceeds a certain value, Fc , equals c
The mathematical formula of the F-distribution probability density function is too complicated to be provided in this book. As we did previously for various probability distributions, we will use a table of pre-calculated values of cumulative probabilities to solve the F-distribution problems (Table 10.3, extraction).
Table 10.3

Example 10.4
(a) For [latex]df_1=15[/latex] and [latex]df_2=8[/latex], determine the critical value of F, which leaves the area equal to 0.95 on the right.
(b) For [latex]df_1=7[/latex] and [latex]df_2=20[/latex], determine the critical value of F, which leaves the area equal to 0.99 on the left.
(c) What is the probability that the ratio of the variance of a sample with a size of 4 over the variance of a sample with a size of 10 is less than 1/10?
Solution
To solve this question, we will refer to the F-distribution table in the appendix.
(a) First, we select the column “15” and the row containing the tail area 0.95 and [latex]df_2=8[/latex](fig. 10. 8).

Figure 10.8. A fragment of the F-distribution table for example 10.4(a)
The number 0.38 at the intersection of selected column and row is the required value of F.
(b) We are looking for a value of F, leaving 0.99 on the left, which corresponds to the area [latex]1-0.99=0.01[/latex]on the right.

Figure 10.9. A fragment of the F-distribution table for example 10.4(b)
Applying the strategy used in part (a), we determine [latex]F=3.70[/latex] (fig. 10.9).
(c) It is given that the sample sizes are [latex]n_1=4[/latex] and [latex]n_2=10[/latex]. Hence, the corresponding degrees of freedoms are [latex]df_1=4-1=3[/latex] and [latex]df_2=10-1=9[/latex]. The ratio of sample variances equals to [latex]\frac{s^2_1}{s^2_2}=\frac{1}{10}=0.1[/latex]. Consequently, we select the column “3” and the rows corresponding to [latex]df_2=9[/latex] (fig. 10.10).

Figure 10.10. A fragment of the F-distribution table for example 10.4(c)
At the intersection of the selected column and rows, we see five numbers: 0.19, 0.11, 0.07, 0.04, and 0.02. Since 0.11 is the closest to the given, 0.1, the required area is 0.95.
Note that the classic tables do not provide F-values for any degrees of freedom. Fortunately, many free Internet resources exist to determine these numbers for almost any degrees of freedoms. Below, we will use one of those free resources and refer to it.
F-Tests Analysis of Variance
In previous chapters, we compared two populations conducting the hypothesis testing. As was mentioned above, the use of F-distribution creates new opportunities for testing equalities of populations based on the analysis of two or more variances (ANOVA). ANOVA is a large branch of statistics. Here, we will give a brief explanation of the equalities of two variances and ANOVA.
F-Tests of the Equality of Two Variances
You already learned that the variance is one of the numerical measures of data dispersion. In practice, very often, the variance characterizes the accuracy of measurements. In fact, this is why a comparison of variances of two populations can be considered as testing the equality of methods used for conducting the measurements for each population. Assume that the variances of two normally distributed populations are [latex]σ^2_1[/latex] and [latex]σ^2_2[/latex]. To test the equality of applied measurement methods (i.e., variances), we will use a null hypothesis.
\[H_0:\ σ^2_1=σ^2_2\]
Depending on the objectives of the analysis, we can take the alternative hypothesis as
[latex]H_a:\ σ^2_1\neq σ^2_2[/latex] (two-tailed problem),
[latex]H_a:\ σ^2_1>σ^2_2[/latex] (right-tailed problem),
or
[latex]H_a:\ σ^2_1<σ^2_2[/latex] (left-tailed problem)
Following the previously explained hypothesis-testing procedure, we will use the sample statistic [latex]\frac{s^2_1}{s^2_2}[/latex] and the F-distribution table to make the conclusion regarding the equality of variances.
Example 10.5
Within one of our community-based research projects, we studied the physical parameters of Indigenous artifacts and collected oral stories (A. Sardarli, A. Volodin, Kh. Osmanli, E. Siegfried, Statistical Analysis of Physical Parameters of Indigenous Artifacts, Lobachevskii Journal of Mathematics, 42, 2022, 3224–329). We conducted measurements using a scanning electron microscope (SEM). The SEM measures the proportions of various chemical substances on the surface of artifacts. Assume that a researcher carried out measurements on an artifact using two different SEMs and determined the normalized amount of carbon. The measurement results are presented below.
Table 10.4

Apply the F-tests analysis of variance to check if the accuracy of microscopes differs. Use [latex]α=0.1[/latex].
Solution:
First, we need to determine the variances of the given samples, as was done in chapter 2. We leave the calculations to readers. Using the formula (F2.7a), one can easily obtain the variances [latex]s^2_1=0.115[/latex] and [latex]s^2_2=0.131[/latex] for the measurements carried out on SEM1 and SEM2, respectively. The corresponding degrees of freedoms are [latex]df_1=5-1=4[/latex] and [latex]df_2=8-1=7[/latex].
Now, we can follow the hypothesis testing procedure. The null hypothesis will be taken as
\[H_0:\ σ^2_1=σ^2_2\]
Then, the alternative hypothesis will be
\[H_a:\ σ^2_1\neq σ^2_2\]
Let’s use the rejection region approach to solve this two-tailed problem. Since [latex]\frac{α}{2}=\frac{0.1}{2}=0.05[/latex], the critical values will be [latex]F_{0.05}[/latex] and [latex]F_{0.95}[/latex] for the left and right tails, respectively (consider that [latex]1-\frac{α}{2}=1-0.05=0.95)[/latex]. Using the F-distribution table (table A5, Appendix), we find that [latex]F_{0.05}=4.12[/latex] and [latex]F_{0.95}=0.16[/latex].
For given samples, [latex]F=\frac{s^2_1}{s^2_2}=\frac{0.115}{0.131}=0.88[/latex], which is not in the rejection region (fig. 10.11).

Figure 10.11. F-distribution curve for example 10.5
Consequently, we do not have enough evidence to reject the null hypothesis. Therefore, with 95% confidence, we can conclude that the accuracies of both microscopes are identical.
10.3 ANOVA Using F-Tests
In the previous example, we tested the equality of variances of two samples selected from two populations. Often, we need to compare three or more populations. Here are some examples:
- Effectiveness of various COVID-19 vaccines against infections
- Prices of regular gas in Canadian provinces on a certain day
- Retention rate of students of different programs of a certain university
- Onset of action of various medications
Theoretically, it can be shown that the F-distribution could be used as a convenient tool to test the equality of means of three or more populations. As we have already done many times, we will select samples to test the equality of populations. Here, we will analyze the simplest case; one sample will be selected from each population. In statistics, this approach to statistical analysis is known as one-way ANOVA.
Consider k populations with means [latex]μ_1,\ μ_2,\ \cdots ,μ_k[/latex] and variance [latex]σ^2[/latex], which satisfy the following conditions:
- All k populations are normal
- All k populations have the same common variance
Note that these two conditions are essential for applying the F-distribution for ANOVA. We can check the equality of these means using hypothesis testing. Obviously, the null hypothesis will be considered as
\[H_0:\ μ_1=\ μ_2=\ \cdots =μ_k\]
Consequently, the alternative hypothesis will address the possibility that at least one of the means differs from others.
[latex]H_a:[/latex] not all means are equal
Assume that one sample has been randomly selected from each of the normal k populations whose sample sizes, means, and variances are, respectively,
\[n_1,\ n_2,\cdots ,\ n_k\]
\[{\overline{x}}_1,\ {\overline{x}}_2,\ \cdots ,\ {\overline{x}}_k\]
\[s^2_1,\ s^2_2,\ \cdots ,\ s^2_k\]
To engage the F-distribution in this analysis, we first need to define some quantities.
Total sample size:
\[n=\sum^k_{i=1}{n_i}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.3)\]
Mean of the total sample:
\[\overline{x}=\frac{\sum^k_{i=1}{n_i{\overline{x}}_i}}{n}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.4)\]
Mean square for treatment:
\[MST=\frac{\sum^k_{i=1}{n_i{\left({\overline{x}}_i-\overline{x}\right)}^2}}{k-1}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.5)\]
Mean square for error:
\[MSE=\frac{\sum^k_{i=1}{\left(n_i-1\right)s^2_i}}{n-k}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.6)\]
Intuitively, we can see that MST represents the dispersion of sample means, whereas MSE provides some information about the dispersion of variances. It can be shown that if the given k populations are normal with equal variances and k randomly selected samples (one sample from each population) are independent, the distribution of the ratio [latex]\frac{MST}{MSE}[/latex] has approximate F-distribution with degrees of freedom [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex]. Once the appropriate probability distribution is found, we can conduct testing the hypothesis discussed above. This time, we will use
\[F=\frac{MST}{MSE}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\mathrm{F}10.7)\]
for our sample statistic (as we used z-score for normal distribution and t-scores for the given degrees of freedom). Upon inspection, one can state that the larger the dispersion of means of samples, the larger the MST and F statistic. If the means of samples are the same, then MST and MSE will be approximately equal, which will make F equal to 1. In other words, larger values of F will be in the rejection region. Therefore, this hypothesis testing will be right-tailed.
Example 10.6
In example 6.4, we already referred to one of our projects conducted in the Peepeekisis and Kahkewistahaw First Nations communities (A. Sardarli, Use of Indigenous Knowledge in Modeling the Water Quality Dynamics in Peepeekisis and Kahkewistahaw First Nations Communities, Pimatiswin: A Journal of Aboriginal and Indigenous Community Health 11(1), 2013, 55–63).

Figure 10.12. Calling Lakes (Saskatchewan, Canada); obtained from open resources
Both communities are situated on the shores of the Calling Lakes, which consist of four interconnected lakes: Pasqua, Echo, Mission, and Katepwa Lakes (fig. 10.12). Acidity is known as one of the indicators of water quality. In chemistry, the acidity level of water is historically denoted as the pH level (“potential of hydrogen”). Consider that a chemist collected four water samples (one sample from each lake) for pH level measurements and recorded the experimental results as follows:
Table 10.5

At the level of 5% significance, test if the data provided in table 10.5 are sufficient to conclude that the pH levels differ in Pasqua, Echo, Mission, and Katepwa Lakes.
Solution:
Assume that all four populations are normal and sampling was independent. These assumptions allow us to use the F-distribution approximation to conduct the hypothesis testing. As we discussed above, the null hypothesis can be taken as
\[H_0:\ μ_1=μ_2=μ_3=μ_4\]
Where [latex]μ_1,μ_2,μ_3[/latex] and [latex]μ_4[/latex] are means of pH levels in Pasqua, Echo, Mission, and Katepwa Lakes, respectively.
Then, the alternative hypothesis must state that
[latex]H_a:[/latex] not all four lakes’ pH levels are the same.
Also, consider that [latex]α=0.05[/latex].
Let’s calculate the quantities, required for the F-test of ANOVA using formulae (F10.3) – (F10.7):
\[n=\sum^4_{i=1}{n_i}=8+8+10+14=40\]
\[\overline{x}=\frac{\sum^4_{i=1}{n_i{\overline{x}}_i}}{n}=\frac{\left(8\bullet 8.2\right)+\left(8\bullet 7.9\right)+\left(10\bullet 7.4\right)+\left(14\bullet 8.0\right)}{40}=7.9\]
\[MST=\frac{\sum^4_{i=1}{n_i{\left({\overline{x}}_i-\overline{x}\right)}^2}}{4-1}=\frac{8{\left(8.2-7.9\right)}^2+8{\left(7.9-7.9\right)}^2+10{\left(7.4-7.9\right)}^2+14{\left(8.0-7.9\right)}^2}{3}=1.11\]
\[MSE=\frac{\sum^4_{i=1}{\left(n_i-1\right)s^2_i}}{40-4}=\frac{\left(8-1\right)\left(0.2\right)+\left(8-1\right)\left(0.3\right)+\left(10-1\right)\left(0.2\right)+(14-1)(0.4)}{36}=0.29\]
\[F=\frac{MST}{MSE}=\frac{1.11}{0.29}=3.79\]
and the degrees of freedom are
\[{df}_1=k-1=4-1=3\]
\[{df}_2=40-4=36\]
Using the F-distribution calculator (https://www.danielsoper.com/statcalc/default.aspx), we can find that the critical value of leaving the area of 0.05 on the right is [latex]F_{0.05}=2.87[/latex].

Figure 10.13. F-distribution curve for example 10.6
The F-value calculated for the sample statistics lies in the rejection region. Consequently, we have sufficient evidence to reject the null hypothesis. Therefore, not all four lakes’ pH levels are equal.
Note that the F-test itself does not tell us which population means differ from others. To specify a certain population, we need to conduct a so-called contrast test, which is beyond the scope of this book.
Chapter 10 Summary
- Chi-square distribution
- Chi-square test of variance
- F-distribution
- F-tests analysis of variance
- o F-tests of the equality of two variances
- o ANOVA using F-tests
You may also view this chapter in presentation format. Just click the link to view.
EXERCISES
F- Distribution
1. The following data shows the tail lengths (in cm) of field mice from three different locations (7 mice from Southern Saskatchewan, 9 mice from Northern Saskatchewan, 10 mice from Alberta)
| Location | Tail lengths (cm) |
| Southern Saskatchewan | 16.1, 15.9, 13.2, 11.9, 14.6, 15.2, 14.7 |
| Northern Saskatchewan | 14.4, 14.9, 14.1, 16.2, 12.3, 15.5, 17.2, 19.8, 10.4 |
| Alberta | 16.2, 17.1, 18.4, 18.3, 14.2, 16.1, 18.1, 17.0, 16.8, 16.5 |
An ANOVA procedure is used to determine if the average tail length is the same for mice in these three locations. At 5% significance, what is the critical F-Value?
2. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) An ANOVA procedure is used for data which were obtained from five populations. Five samples each comprised of 20 observations were taken from the five populations. The numerator and denominator (respectively) degrees of freedom for the critical value of F are
a) 5 and 20
b) 4 and 20
c) 5 and 100
d) 4 and 95
e) none of the above answers are correct.
Select the correct answer.
3. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) The F ratio in a completely randomized ANOVA is the ratio of
a) MSTR/MSE
b) MST/MSE
c) MSE/MSTR
d) MSE/MST
e) none of the above answers are correct
Select the correct answer.
4. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) The critical F value with 6 numerator and 60 denominator degrees of freedom at α = 0.05 is
a) 3.74
b) 2.25
c) 2.37
d) none of the above answers are correct
Select the correct answer.
F-Tests Analysis of Variance
1. Surgical patients are classified as elective, urgent or emergency. From hospital records it is known that 50% of surgery patients are elective, 30% are urgent and 20% are emergencies. It is also known that the average length of stay (LOS) of elective patients is 1.1 days. For urgent patients the average LOS is 3.1 days. For emergency patients the average LOS is 6.7 days.
a) What is the expected LOS for a surgery patient?
b) If 200 patients are admitted for surgery in a month, how many days in total would you expect the patients to stay?
2. A student advisor at the University of Regina wants to determine if the average number of hours per week spent on independent study differs among students in five different faculties: Arts, Science, Engineering, Business, and Kinesiology. A random sample of 12 students is selected from each faculty, for a total of 60 students. Each student reports the number of hours they study per week. The data are summarized in the table below.

Test, at the 5% level of significance, whether the data provide sufficient evidence to conclude that there is a difference in the average study hours among the five faculties.
The Chi-square distribution is a probability distribution that is widely used in statistics for hypothesis testing and confidence interval estimation. It is a continuous probability distribution that arises in the context of testing hypotheses about the variance of a normally distributed population.
Analysis of variance (ANOVA) is a statistical technique used to compare means of three or more groups to determine if there are statistically significant differences between them.