Chapter 8. Hypothesis Testing
Chapter Objectives
In this chapter, readers will learn to do the following:
- Define and apply the hypothesis testing to large and small sample means
- Define and apply rejection region and p-value methods for hypothesis testing
- Apply the hypothesis testing to a large sample proportion
- Apply the hypothesis testing to test the difference between two populations
- Apply hypothesis testing to matched pairs
8.1 The Philosophy of Hypothesis Testing
We hope the reader has watched at least one "detective" movie in her or his life, where a brave detective solves a mystery using analytical skills. Usually, the first thing the detective does is make a list of suspects. Then the detective excludes from the list of those with an alibi for the moment when the crime happened. In other words, the detective eliminates those versions of the crime versions that do not seem reasonable enough based on collected facts/evidence. The theory of this approach was born thousands of years ago, in ancient Greece, in the public dialogues of philosophers. It is believed that Socrates was the first who used this method in his arguments. The Socratic method is based on the elimination of those hypotheses that lead to contradictions. The word “hypothesis” comes from the Ancient Greek word and can be literally translated as "putting or placing under." Nowadays, this word is used in different contexts, mainly in the sense of a proposed explanation for a phenomenon or event. Scientists create scientific hypotheses to explain the results of some experiments and observations that cannot be described by way of existing theories. Later, in the fourteenth century, another philosopher, William of Ockham, suggested that if there exist two competing hypotheses about the same prediction, we must prefer the one that requires the fewest assumptions (Ball, 2016). Sometimes this principle is presented as "the simplest explanation is the most possible." We do not intend to go deep into philosophical reflections in this chapter and will leave it to readers to seek out further sources if they wish to enjoy the beauty of ancient philosophy.
The statistical test of the hypothesis is very similar to the process in a modern courtroom, which follows the principle of the presumption of innocence. This principle is based on the statement that every accusee (a person accused of a crime) is considered innocent until proven guilty. Therefore, at the beginning of the process, by convention, it is assumed that the accused is innocent. We call this statement a null hypothesis and denote it as H0. A prosecutor (different titles can be used for this appointment in different countries) opposes this statement declaring an alternative hypothesis, insisting that the accusee is guilty. He/she presents to judges the collected evidence against the null hypothesis. If the prosecutor provides sufficient evidence, judges reject the null hypothesis and declare the accusee guilty. Otherwise, they admit that there was not enough evidence to reject the null hypothesis. In this allegory, statisticians play the role of prosecutors.
Example 8.1
Within one of our community-based projects, Indigenous Elders told us that the number of eggs in shorebird nests depends on the lake water quality (A. Sardarli, K. Budsaba, T. Ngamkham, A. Volodin, K. Baidoo, "Modeling of Water Quality Dynamics Using Indigenous Knowledge", Thailand Statistician, Vol. 8(2) July 2010, 207-222). Assume that according to Elders’ many years of experience, the average number of eggs in a shorebird nest is five. We might need to check if there is any change in the number of eggs in shorebird nests over the course of our study. We develop a statement that the mean number of eggs in shorebird nests equals 5. In statistics, we call this statement the null hypothesis and express it as \( H_0: \mu = 5 \). Since this is the result of previous years, one can suggest that during our study this number would not be 5. This suggestion is called the alternative hypothesis and is denoted as \( H_a: \mu \neq 5 \). Note that sometimes we might be interested to find out if this number has increased or decreased—in other words, to specify cases when \( \mu < 5 \) or \( \mu > 5 \). For those cases, the alternative hypotheses would be formed as \( H_a: \mu < 5 \) or \( H_a: \mu > 5 \), respectively. In statistics, this procedure is called hypothesis testing. To check the null hypothesis in this example, we randomly select a sample of nests and, using statistics, analyze if the population mean is still 5 or if it has changed. If we have enough evidence against the null hypothesis, we will reject it, concluding that the average amount of eggs in one nest is not 5 anymore; the amount could be more as well as less than 5. Otherwise, we fail to reject the null hypothesis. Perhaps, in this case we would reject the null hypothesis if we could have more data available. Below we will describe a five-step procedure of hypothesis testing in detail.
Let's mark that the null hypothesis represents the status quo (long-time experience, common sense), whereas the alternative hypothesis represents something new and unusual, which needs to be verified.
Five Steps of Hypothesis Testing
A five-step procedure has been developed to process the hypothesis testing.
Step 1. Step 1. Develop the null hypothesis \( H_0 \) and the alternative hypothesis \( H_a \).
Step 2. Identify the type of distribution and select the relevant test statistic and the significance level.
Step 3. Determine the decision rule to reject or fail to reject the null hypothesis. (Below, we will provide more details about the decision rules.)
Step 4. Execute essential computations and find the value of the test statistic.
Step 5. Compare the value obtained from step 4 with the constraints determined in step 3 and make a decision: reject or do not reject the null hypothesis.
It is very important to express the decision made in step 5 in the form of an acceptable wording, especially if you solve an application problem and the results of your analysis might be used by non-statisticians.
In example 8.1, we developed the null hypothesis based on the information provided prior to our analysis. In any hypothesis test, we must specify a null or "no effect" or "status quo" hypothesis before we perform the test. We always assume the null hypothesis is true, or at least is the most plausible explanation, before we do the test, which can only disprove the null hypothesis. The alternative hypothesis is the hypothesis that we set out to test for. It is the hypothesis that we wish to prove.
To select the relevant test statistic, we need to identify the type of distribution. In previous chapters, we reviewed various distributions and the criteria for identifying them. In this chapter, we will analyze examples where either the normal or Student’s distribution can be applied. Also, we have to consider that under particular conditions, the normal distribution approximation can be used for non-normal distributions too. In these cases, we will follow the previously used procedures to identify the distribution types.
Prior to the test statistic, we also need to define the significance level, α. Later we will show that in probability language, α determines the probability of rejecting the null hypothesis when it is true.
Decision Rule
After constructing the null and alternative hypotheses and defining the level of significance associated with the test, we determine the points on the distribution of the test statistic, and we will decide when/if the null hypothesis should be rejected in favour of the alternative hypothesis. In this book, we will learn about two methods, rejection region and p-value, which can be used to conduct the analysis of sampled data and make the decision regarding the null hypothesis.
Rejection Region Method
Assume that we decide to perform a hypothesis test with (1-α)∙100% of confidence. The significance level α determined by the decision rule corresponds to the area of the so-called rejection region. The cut-off number (or numbers) corresponding to the rejection region is called a critical value. In this book, we will denote the critical value by C (or C and C'). In other words, we decide that observations inside of the rejection region are "statistically significant" at the α level.

Figure 8.1. The rejection region and critical values for one- and two-tail problems
Figure 8.1 visualizes the following cases:
- If H0:μ=μ0 and Ha:μ<μ0, We reject the null hypothesis if x̅<C
- If H0:μ=μ0 and Ha:μ>μ0, We reject the null hypothesis if x̅>C
- If H0:μ=μ0 and Ha:μ≠μ0, we reject the null hypothesis if x̅<C and x̅>C'
In previous chapters, we saw that working with standardized distribution curves is more practical. If the normal distribution is applicable, we can use corresponding z-scores instead of the critical values and sample mean. In this case, \( z = \frac{\bar{x}-\mu_0}{\sigma / \sqrt{n}} \) if \( \sigma \) is known, or \( z = \frac{\bar{x}-\mu_0}{s / \sqrt{n}} \) if \( \sigma \) is not known. Then, the graphs provided above (Fig. 8.1) are modified as shown in Figure 8.2.

Figure 8.2. The rejection region and corresponding z-scores for one- and two-tail problems
Example 8.2
One of the indicators of water quality is the pH level, which, in chemistry, denotes the measure of water's acidity. In Canada, the operational guideline for pH is a range of 7.0 to 10.5 in finished drinking water. (Guidelines for Canadian Drinking Water Quality: Guideline Technical Document – pH, obtained from: https://www.canada.ca/en/health-canada/services/publications/healthy-living/guidelines-canadian-drinking-water-quality-guideline-technical-document-ph.html). Residents of a small town were concerned that their drinking water could be affected by the acid-producing chemical plant in the neighbourhood. The mayor told them that according to the historical monitoring data, the mean pH level is equal to 8.2. Residents invited an independent statistician and asked her to check this statement at the 10% level of significance. To conduct the task, the statistician randomly selected pH level values of 40 months and evaluated the sample mean x̅=8.4 and sample standard deviation s=0.5. Does she have enough evidence to appeal the mayor's statement?
Solution:
To solve the question, we will follow the five-step procedure explained above.
Step 1. Based on the provided information, we construct the null and alternative hypotheses: \( H_0: \mu_0 = 8.2 \)
Since residents are concerned about any significant change in pH level (increase or decrease), \( H_a: \mu_0 \neq 8.2 \)
Step 2. Since the sample size \( n = 40 \) exceeds 30, we can use the normal distribution approach for the test statistic: \( z = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \) at the level of significance \( \alpha = 0.1 \).
Step 3. The rejection region will be two tails, on the left and on the right, with the areas \( \frac{\alpha}{2} = \frac{0.1}{2} = 0.05 \)
The corresponding z-score can be obtained from the normal distribution table (table A3, Appendix): \( z_{\alpha/2} = 1.645 \) In other words, the rejection region is \( z \le -1.645 \) and \( z \ge 1.645 \).
Step 4. For the selected sample,
\( z = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} = \frac{8.4-8.2}{0.5/\sqrt{40}} = 2.53 \)
Step 5. The z-score obtained from the test statistic lies in the rejection region (Fig. 8.3). This means that

Figure 8.3. The rejection region and z-score evaluated for the sample given in example 8.2
the statistician has enough evidence to reject the null hypothesis. Therefore, she can report to residents that their concern about the water quality is statistically reasonable and that the mayor's statement is incorrect.
p-Value Method
Technically, instead of comparing the mean x̅ with the critical value, we could compare the area of the rejection region α with the area of the region restricted by x̅. The latter area is called a p-value. If the p-value is less than α, the z-score evaluated for the sample will lie in the rejection region. Otherwise, the z-score will stay outside of the rejection region. Let's solve the problem provided in example 8.2 using the p-value method.
Example 8.3
To solve the problem using the p-value method, we will follow the procedure below.
Steps 1 and 2 will stay unchanged (see example 8.2).
Step 3. We are already convinced that this is a two-tail problem. Hence, the p-value will be some of the areas of two tails to the right and the left from the z-score of sample statistics (fig. 8.4). If the p-value is less than α=0.1, this will be considered sufficient to reject the null hypothesis. Otherwise, we will accept the mayor's statement that the mean pH level has been 8.2 for many years.
Step 4. We have already determined z=2.53 for the selected sample (see example 8.2). The area to the left of z=-2.53 can be found from the normal distribution table (table A3, Appendix) and equals 0.0057. Due to the symmetricity of normal distribution, the area to the right of z=2.53 is the same. Therefore, the p-value is 2∙0.0057=0.0114.
Step 5. Comparing the p-value 0.0114 with the given α=0.1, we can conclude that there is enough evidence to reject the null hypothesis since the p-value is less than α. This conclusion brings us to the same result obtained in example 8.2, which indicated that the mayor's statement was incorrect. Note that even if α=0.05 or 0.02, we would still reject mayor's statement, because the p-value is much smaller than α.

Figure 8.4. The rejection region and p-value evaluated for the sample given in example 8.3
After analyzing examples 8.2 and 8.3, it would be reasonable to think about the meaning of significance α and p-value in a deeper context. In general, within the hypothesis testing, we need to make a decision: either to reject the null hypothesis or state that there is no sufficient evidence to reject it. As this decision has to be made at a given level of significance α, there always exists some probability of making a wrong decision.
The wrong decision could take place due to two types of errors:
- Rejecting the null hypothesis when it is actually true (for example, convicting a suspect while he/she is innocent)
- Not rejecting the null hypothesis when it is actually untrue (for example, declaring a suspect innocent while he/she is guilty)
In statistics, the first error listed above is called a type I or alpha error, and the second error a type II or beta error. The following table (called a decision table) represents all possible outcomes.
Table 8.1
|
|
Actual |
|
|
Decision |
True |
False |
|
Rejecting \( H_0 \) |
Type I error (suspect is innocent but is convicted) |
Correct decision (suspect is innocent and is not convicted) |
|
Failing to reject \( H_0 \) |
Correct decision (suspect is guilty and is convicted) |
Type II error (suspect is guilty but is not convicted) |
Let's try to visualize the type I error, referring to graphs in figure 8.5. Without compromising generality, we can consider the two-tail case.

Figure 8.5. Illustration of the type I error
If the true mean is \( \mu_0 \) but by chance the sample mean \( \bar{x} \) is in the rejection region, following the formal five-step procedure we would reject the null hypothesis—in other words, a type I error would occur. The total area below the curve is 1. The area of the rejection region equals \( \alpha \). Consequently, the probability that the sample mean lies in the rejected region can be determined as
\[
P(\text{The sample mean lies in the rejection region}) = \frac{\alpha}{1} = \alpha
\]
Therefore, \( \alpha \) is the probability of a type I error.
Note that in examples 8.2 and 8.3, \( \alpha \) was given. Often, when statisticians start analyzing data, \( \alpha \) is not provided. In those cases, by convention, we define \( \alpha = 0.05 \). In example 8.3, the p-value was much less than \( \alpha \), and it was easy to make the decision with no doubt. Working on real projects, statisticians need to deal with various cases. Sometimes, it is not clear whether one should decide to reject or accept the null hypothesis. However, one needs to make a decision at some certainty anyway. By convention, the researchers classify four levels of significance and go by the following guidelines:
- Highly significant: p-value is less than 0.01. Reject the null hypothesis.
- Significant: p-value is between 0.01 and 0.05. Reject the null hypothesis.
- Moderately significant: p-value is between 0.05 and 0.10. If an immediate decision is required, do not reject the null hypothesis. However, it would be reasonable to continue studies subject to the availability of time and data collection resources.
- Not statistically significant: p-value is greater than 0.10. Do not reject the null hypothesis.
Here are some more examples to help you to classify the types of errors in the application of hypothesis testing.
- Assume that a new drug is being tested to verify if it lowers blood pressure compared to a placebo. First, we need to construct the null and alternative hypotheses:
- H0: The new drug does not affect blood pressure
- Ha: The new drug lowers blood pressure
- If the test indicates that the drug lowers blood pressure when it actually does not, this is a type I error. On the contrary, if the test indicates that the new drug does not lower blood pressure when it actually does, the error is classified as type II. In this case, we would reject a good drug due to the wrong decision.
- Assume that a randomly selected vaccinated person is tested to verify if he was infected with COVID-19. Again, we would first need to construct the null and alternative hypotheses:
- H0: The person is not infected
- Ha: The person is infected
If the test indicates that the person is infected (rejecting the null hypothesis) when in reality he is not, this error is classified as type I. If the test indicates that the person is not infected (failing to reject H0), but in reality he is infected by COVID-19, this could result in a lack of necessary treatment and worsening of the person's condition. This error is classified as a type II.
Data collection is an expensive and time-consuming process. In previous chapters, we briefly talked about the challenges of data collecting. Sometimes, getting a wishable sample size and using the normal distribution approximation is even impossible. In chapter 7, we learned about Student's distribution, which is suitable for small-size samples under some conditions. This distribution can be used to conduct hypothesis testing if the population is normal, but large samples are not available. Wherein the five-step procedure of hypothesis testing remains the same.
Below, we will apply Student’s distribution to conduct the hypothesis testing due to small sample size.
Example 8.4
Carrie, an auto blogger, was told that the price of gasoline in Regina in June 2023 was normally distributed with a mean of 159.8 cents per litre. To check if this information is accurate, Carrie sampled 16 service stations, obtaining a sample mean of 161.6 cents with the 2.2 cents per litre of sample standard deviation; she was subsequently concerned whether the actual price was higher than 159.8. We will assist Carrie in finding out if it makes sense to trust the released information.
Solution:
Step 1. Construct the null and alternative hypotheses:
\[
H_0: \mu_0 = 159.8
\]
Since Carrie is concerned about the possibility of higher prices, we set the alternative hypothesis as:
\[
H_a: \mu_0 > 159.8
\]
Step 2. It is given that the population is normal. However, the sample size is small, \( n = 16 < 30 \). So we will use the t-distribution to test the sample. Since the level of significance is not given, by convention, we will consider \( \alpha = 0.05 \).
Step 3. This is a one-tail problem. We can use either the rejection region or the p-value method to solve it. Let's determine the t-score corresponding to the critical value of the rejection region (note that it is on the right of the distribution curve, \( H_a: \mu_0 > 159.8 \)) and compare it with the t-score calculated from the sample statistics. The critical value corresponding to \( \alpha = 0.05 \) and \( \text{df} = 16-1 = 15 \) is:
\[
t_{\alpha} = 1.753
\]
Step 4. The t-score for the given sample statistics can be evaluated as:
\[
t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} = \frac{161.6 - 159.8}{2.2 / \sqrt{16}} = 3.272
\]

Figure 8.6. The rejection region and t-score evaluated for the sample given in example 8.4
Step 5. Since the t-score of 3.272, evaluated for the given sample, is in the rejected region (fig. 8.6), we must reject the null hypothesis. In other words, Carrie can argue with 95% confidence that the gas prices in June 2023 were higher than 159.8 cents per litre. Honestly speaking, we must admit there is still a 5% chance that Carries was provided with accurate information, and the true mean was indeed 159.8 cents per litre.
8.2 Large-Sample Hypothesis Testing for One Proportion
Now, we are ready to extend the application of hypothesis testing to discrete variables whose distributions can be approximated by the normal distribution. Below, we will use the large-sample hypothesis testing for a binomial proportion if the conditions \( np > 5 \) and \( nq > 5 \) are satisfied, where \( n \) is the sample size, \( p \) is the true proportion, and \( q = 1-p \). Earlier, we determined the standard error as:
\[
SE = \sqrt{\frac{pq}{n}}
\]
Then, the large-sample hypothesis testing for \( p \) will be conducted using the following five-step procedure:
Step 1. Construct the null hypothesis:
\[
H_0: p = p_0
\]
Construct the alternative hypothesis (choose one depending on the test type):
For two-tailed test: \( H_a: p \neq p_0 \)
For right-tailed test: \( H_a: p > p_0 \)
For left-tailed test: \( H_a: p < p_0 \)
Step 2. Verify if the normal distribution approximation can be applied (\( np_0 > 5 \) and \( nq_0 > 5 \)) and define the significance level \( \alpha \). Note that sometimes the experiment can be designed such that the sample meets the desired distribution conditions (see example 8.5).
Step 3. Conduct the test statistic for the selected/given sample:
\[
z = \frac{\bar{p} - p_0}{SE}
\]
where
\[
SE = \sqrt{\frac{p_0 q_0}{n}}, \quad \bar{p} = \frac{x}{n}
\]
and \( x \) is the number of successes in \( n \) binomial trials.
Step 4. Execute the essential computations and find the value of the test statistic \( z \).
Step 5. Compare the computed value of \( z \) with the critical value(s) determined from the significance level and make a decision: reject or do not reject the null hypothesis.
Example 8.5
A councillor of a band office wanted to nominate her candidacy in chief elections in her First Nation community and appointed her assistant to estimate the chances of winning. The councillor told her assistant that in the last elections to the council, she received 62% of the votes, and that she would like to ensure that in the upcoming elections, she will not receive less than this percentage within a 2% accuracy. The assistant decided to use hypothesis testing to estimate the councillor’s chances.
Step 1. The assistant set up the null and alternative hypotheses:
\[
H_0: p = 0.62, \quad H_a: p < 0.62
\]
Step 2. The assistant received the permission of the band office to administer a survey in the community. He planned to design the sample so that it would meet the requirements of the normal approximation. By convention, he chose \(\alpha = 0.05\). Since this is a left-tail problem, \(z_{0.05} = 1.645\). Then, he estimated the minimum sample size using:
\[
n = \frac{p_0 q_0}{B^2} z_\alpha^2
\]
Considering that \(p_0 = 0.62, q_0 = 1-0.62=0.38, B=0.02, z_{0.05} = 1.645\),
\[
n = \frac{0.62 \cdot 0.38}{0.02^2} \cdot 1.645^2 = 31.88 \approx 32
\]
Since \(n p_0 = 32 \cdot 0.62 = 19.84 > 5\) and \(n q_0 = 32 \cdot 0.38 = 12.16 > 5\), the assistant reasonably used the normal approximation for this proportion.
Step 3. The assistant randomly selected 32 voters from the community and asked them if they would vote for the councillor in the chief elections. Of these, 18 voters expressed their support. Therefore,
\[
\bar{p} = \frac{x}{n} = \frac{18}{32} = 0.5625 \approx 0.56
\]
and
\[
SE = \sqrt{\frac{p_0 q_0}{n}} = \sqrt{\frac{0.62 \cdot 0.38}{32}} = 0.09
\]
Step 4. For the selected sample:
\[
z = \frac{\bar{p} - p_0}{SE} = \frac{0.56 - 0.62}{0.09} = -0.67
\]
The p-value for the selected sample, obtained from the normal distribution table (table A3, Appendix), is 0.2514.
Step 5. Since the p-value \(0.2514 > \alpha = 0.05\), the assistant does not have enough evidence to reject the null hypothesis. Therefore, he can report to the councillor that, according to the hypothesis testing based on the available information and survey results, she has a good chance of winning the chief elections.
8.3 Hypothesis Testing for the Difference Between Two Populations
In the previous chapter, you learned how to estimate the difference between two population means and proportions. We analyzed data sets using continuous and binomial variables. Below, we will explain how a hypothesis can be used to compare two populations. Attentive readers will notice that we will follow a similar procedure applying the hypothesis testing. In fact, we will use the same technique. But the object of our investigation will change: Instead of a mean (or proportion), we will test the difference of two means (or proportions). As we did in the previous chapter, two randomly selected samples (one sample from each population) will be considered to test the difference between populations. The statistic calculations will be conducted using the formulae provided in the previous chapter.
Large-Sample Hypothesis Testing for Two Means
Consider two populations whose means are \(\mu_1\) and \(\mu_2\) with standard deviations \(\sigma_1\) and \(\sigma_2\), respectively, and two samples (one from each population) have been randomly selected from the given populations with the following statistics:
Table 8.2
|
|
Sample 1 |
Sample 2 |
|
Mean |
x̅1 |
x̅2 |
|
Standard deviation |
s1 |
s2 |
|
Sample size |
n1 |
n2 |
According to the Central Limit Theorem, if both populations are normal or the sample sizes are not less than 30, we can use the normal distribution approximation to estimate the difference between the means of these populations. We will use the provided statistics to test the difference of population means (\(\mu_1-\mu_2\)). The standard error will be calculated using the formula
\[
SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}
\]
where \(\sigma_1\) and \(\sigma_2\) are the standard deviations of populations 1 and 2, respectively. If the standard deviations of the populations are not given, we will determine the standard error using the standard deviations of samples:
\[
SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}
\]
Above, we described the five-step technique to conduct hypothesis testing for one population. A similar procedure can be used to test the difference between two populations.
Step 1. Developing the null hypothesis \(H_0: \mu_1 - \mu_2 = D\) and the alternative hypothesis, either \(H_a: \mu_1 - \mu_2 \neq D\), \(H_a: \mu_1 - \mu_2 > D\), or \(H_a: \mu_1 - \mu_2 < D\). In most cases we need to test if there is any difference between the means of two populations and consider that \(D = 0\).
Step 2. Identifying the type of distribution and selecting the relevant test statistic and the significance level. In this section, we will provide an example of two large samples. Later, we will also analyze the possibility of a normal approximation for proportions.
Step 3. Determination of the decision rule to reject or fail to reject the null hypothesis. Above, we explained the rejection region and p-value methods and provided examples.
Step 4. Execution of essential computations and finding the value of the test statistic, considering that both samples’ sizes are 30 or larger:
\[
z = \frac{\bar{x}_1 - \bar{x}_2 - D}{SE}
\]
Step 5. Comparing the value obtained from step 4 with the constraints determined in step 3 and making a decision: reject or do not reject the null hypothesis.
Example 8.6
We will use the data provided in example 7.13 and test the difference between the acting durations of two medications at 98% confidence interval.
Table 8.3
|
|
Medication 1 |
Medication 2 |
|
Sample mean of the acting duration, in minutes |
280 |
272 |
|
Standard deviation, in minutes |
30 |
20 |
|
Sample size |
60 |
50 |
Solution:
Step 1. First, we construct the null hypothesis considering that D=0:
\[
H_0: \mu_1 - \mu_2 = 0
\]
We are not asked to determine which medication acts longer; we just need to find out if there is any difference between the acting durations. Therefore, the alternative hypothesis will be:
\[
H_a: \mu_1 - \mu_2 \neq 0
\]
Step 2. Since both sample sizes exceed 30, we can use the normal distribution approach.
Step 3. Let’s use the p-value technique, considering that \(\alpha = 0.02\).
Step 4. Compute the test statistic:
\[
z = \frac{\bar{x}_1 - \bar{x}_2}{SE} = \frac{280 - 272}{\sqrt{\frac{30^2}{60} + \frac{20^2}{50}}} = 1.67
\]
Using the normal distribution table (table A3, Appendix), we can determine the p-value for this two-tail problem:
\[
\text{p-value} = 2 \cdot 0.9525 = 1.905
\]
Step 5. Since the p-value is much greater than \(\alpha\), we do not have enough evidence to reject the null hypothesis. Therefore, we conclude that within a 98% confidence interval, both medications have approximately the same acting duration. Note that the hypothesis testing brought us to the same conclusion based on the interval estimation method in the previous chapter.
Large-Sample Hypothesis Testing for Two Proportions
Below we will explain how hypothesis testing can be used to analyze the difference between two proportions. Consider two binomial populations with parameters \(p_1\) and \(p_2\), respectively. Assume that two samples (one from each population) have been randomly selected from the given populations with the following statistics:
Table 8.4
|
|
Sample 1 |
Sample 2 |
|
Number of successes |
x1 |
x2 |
|
Sample size |
n1 |
n2 |
The corresponding sample proportions can be evaluated, respectively, as
\[
\bar{p}_1 = \frac{x_1}{n_1} \quad \text{and} \quad \bar{p}_2 = \frac{x_2}{n_2}.
\]
Consequently, the difference between the sample proportions is
\[
\bar{p}_1 - \bar{p}_2 = \frac{x_1}{n_1} - \frac{x_2}{n_2}.
\]
The standard error for two samples is evaluated as
\[
SE = \sqrt{\frac{p_1 q_1}{n_1} + \frac{p_2 q_2}{n_2}},
\]
where \(q_1 = 1 - p_1\) and \(q_2 = 1 - p_2\).
As was explained in the previous chapter, if the population proportions are not given, the standard error can be estimated using the sample proportions:
\[
SE = \sqrt{\frac{\bar{p}_1 \bar{q}_1}{n_1} + \frac{\bar{p}_2 \bar{q}_2}{n_2}},
\]
where \(\bar{q}_1 = 1 - \bar{p}_1\) and \(\bar{q}_2 = 1 - \bar{p}_2\).
The sampling distribution of \(\bar{p}_1 - \bar{p}_2\) can be approximated by normal distribution using the Central Limit Theorem if the following conditions are satisfied:
\[
n_1 \bar{p}_1 > 5, \quad n_1 \bar{q}_1 > 5, \quad n_2 \bar{p}_2 > 5, \quad \text{and} \quad n_2 \bar{q}_2 > 5.
\]
The hypothesis testing procedure for two proportions is similar to the hypothesis testing procedure for the difference in means of two large populations, which was provided above.
Step 1. Develop the null hypothesis
\[
H_0: p_1 - p_2 = 0
\]
and the alternative hypothesis, either
\[
H_a: p_1 - p_2 \neq 0, \quad H_a: p_1 - p_2 > 0, \quad \text{or} \quad H_a: p_1 - p_2 < 0.
\]
Step 2. Verify the possibility of normal distribution approximation (the conditions \(np_0 > 5\) and \(nq_0 > 5\)) and define the significance level.
Step 3. Determine the decision rule to reject or fail to reject the null hypothesis.
Step 4. Execute the essential computations and find the value of the test statistic, considering that both populations are normal:
\[
z = \frac{\bar{p}_1 - \bar{p}_2}{SE}.
\]
Step 5. Compare the value obtained from Step 4 with the constraints determined in Step 3 and make a decision: reject or do not reject the null hypothesis.
Example 8.7
A band office of a First Nation asked the community gas station manager to check if a higher proportion of gas customers make purchases in the convenience store on weekends. The manager randomly selected the records of 50 weekend and 80 weekday gas customers. According to the records, 40 of the selected weekend and 48 of the selected weekday gas customers made purchases in the convenience store. Conduct the hypothesis testing to answer the band office’s request.
Solution:
To solve the question, we follow the five-step procedure for hypothesis testing, considering the weekend and weekday gas consumers as two populations, where \(p_1\) and \(p_2\) represent the proportions of gas consumers who made purchases in the convenience store on weekends and weekdays, respectively.
Step 1. Develop the null hypothesis:
\[
H_0: p_1 - p_2 = 0
\]
Step 2. Develop the alternative hypothesis:
\[
H_a: p_1 - p_2 > 0
\]
Step 3. Determine the sample proportions:
\[
\bar{p}_1 = \frac{40}{50} = 0.8, \quad \bar{p}_2 = \frac{48}{80} = 0.6
\]
The corresponding sample failure proportions are:
\[
\bar{q}_1 = 1 - 0.8 = 0.2, \quad \bar{q}_2 = 1 - 0.6 = 0.4
\]
Verification:
\[
n_1 \bar{p}_1 = 50 \cdot 0.8 = 40 > 5, \quad n_1 \bar{q}_1 = 50 \cdot 0.2 = 10 > 5
\]
\[
n_2 \bar{p}_2 = 80 \cdot 0.6 = 48 > 5, \quad n_2 \bar{q}_2 = 80 \cdot 0.4 = 32 > 5
\]
This shows that the normal approximation can be applied. By convention, we set the significance level \(\alpha = 0.05\) since it is a right-tailed problem.
Step 4. Compute the test statistic:
\[
z = \frac{\bar{p}_1 - \bar{p}_2}{SE} = \frac{0.8 - 0.6}{\sqrt{\frac{0.8 \cdot 0.2}{50} + \frac{0.6 \cdot 0.4}{80}}} = \frac{0.2}{0.0970} = 2.06
\]
Using the normal distribution table (table A3, Appendix), the p-value (area to the right of 2.06) is:
\[
p\text{-value} = 1 - 0.9803 = 0.0197
\]
Step 5. Decision:
Since \(p\text{-value} < \alpha\), we reject the null hypothesis and accept the alternative hypothesis. Therefore, with 95% confidence, the manager can report that a higher proportion of weekend gas customers made purchases in the convenience store.
Small-Sample Hypothesis Testing for Two Means
We have already solved several problems when the normal distribution approximation was not applicable to analyze sample inferences. Under some circumstances (when data are approximately normally distributed), we used the t-distribution. In the previous chapter, we showed how Student’s distribution can be applied to estimate the difference between two populations if the sample sizes were less than 30 and the population variances were unknown.
Now we will solve a similar problem using hypothesis testing. The pooled variance can be determined using the formula:
$$
s^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \quad (F7.25)
$$
It can be shown that in this case the test statistic
$$
t = \frac{\bar{x}_1 - \bar{x}_2 - D}{\sqrt{s^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \quad (F8.1)
$$
has a Student’s t-distribution, where \(n_1\) and \(n_2\) are the sample sizes, \(\bar{x}_1\) and \(\bar{x}_2\) are the sample means, and \(s_1\) and \(s_2\) are the standard deviations of the first and second samples, respectively.
The total number of degrees of freedom for the two samples is
$$
df = (n_1 - 1) + (n_2 - 1) = n_1 + n_2 - 2
$$
Now let’s test the difference between the two populations provided in Example 7.15 using hypothesis testing.
Example 8.8
A statistics professor teaches a STAT 100 course for two sections: section 1, in-classroom format, and section 2, online format. He decided to estimate the difference between the means of mid-term exam scores (the maximum score is 40) in two sections. He randomly selected 10 works from section 1 and 8 works from section 2 and calculated the sample means and deviations as presented in table 7.9.
Table 7.9
|
|
Section 1 |
Section 2 |
|
Mean of scores (out of 40) |
30 |
24 |
|
Standard deviation (out of 40) |
4 |
6 |
|
Sample size |
10 |
8 |
Apply the hypothesis testing to check if there is a difference between the performance of online and in-classroom students. Consider
Solution:
Step 1.
Develop the null and alternative hypotheses:
$$
H_0: \mu_1 - \mu_2 = 0
$$
$$
H_a: \mu_1 - \mu_2 \neq 0
$$
Step 2.
We will use the t-distribution because both sample sizes are less than 30. This is a two-tailed problem since we are not asked to determine which section’s students performed better.
Step 3.
Use the rejection region technique (the p-value method is also applicable).
Consider
$$
\frac{\alpha}{2} = 0.005, \quad df = 10 + 8 - 2 = 16
$$
From the t-distribution table (table A4, Appendix), the critical value is
$$
t_{0.005, 16} = 2.977
$$
Step 4.
From Example 7.15, the pooled variance is
$$
s^2 = 25
$$
Hence, the test statistic is
$$
t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}
= \frac{30 - 24}{\sqrt{25\left(\frac{1}{10} + \frac{1}{8}\right)}}
= \frac{6}{2.372} = 2.530
$$
Step 5.
Since the calculated t-value does not lie in the rejection region (fig. 8.7), we do not have enough evidence to reject the null hypothesis.

Figure 8.7. The rejection region and t-score evaluated for the samples given in example 8.8
Therefore, we conclude that within a 99% confidence interval, there is no difference between the test results of online and in-classroom students.
8.4 Hypothesis Testing for Matched Pairs
In the previous chapter, we examined the differences between populations using the matched pair design. You could see that designing the pairs is quite a complicated process. After the samples are selected, various techniques can be used to test the paired differences, including the hypothesis. Let’s consider the example below.
Example 8.9
It is well-known that even a low level of alcohol consumption blunts alertness and reduces the motor coordination of drivers. In particular, the increase of alcohol concentration in the blood causes the loss of reaction time and control, which reduces a person’s ability to maintain the proper line position and appropriate braking. Studies show that the 0.08 level of blood alcohol concentration (BAC, approximately four drinks for a 80-kilogram body) causes a change in reaction time equal to 120 milliseconds (How Alcohol Impairs Your Ability to Drive, Michigan Medicine, Michigan University, July 29, 2016, obtained from: https://www.michiganmedicine.org/health-lab/how-alcohol-impairs-your-ability-drive). [Note: If a driver’s speed is 100 kilometres per hour, during that 120 milliseconds he/she covers 3.3 metres. In other words, four drinks can increase a car’s stopping distance by more than 3 metres.]
In many countries, 0.08 is the sanctioned limit for BAC. A company producing alcoholic beverages claims that consumption of one unit of their product changes reaction time by less than 120 milliseconds. To test this statement, an independent researcher called for male cab driver volunteers weighing between 70 and 90 kilograms. All volunteers conducted the same task before and after drinking one unit of the provided alcohol beverage. The table below provides the reaction times of 10 volunteers before and after consuming the beverage.
Table 8.5
|
Participate # |
Before drinking |
After drinking |
|
1 |
694 |
972 |
|
2 |
697 |
794 |
|
3 |
798 |
833 |
|
4 |
745 |
986 |
|
5 |
617 |
872 |
|
6 |
720 |
829 |
|
7 |
668 |
791 |
|
8 |
736 |
885 |
|
9 |
690 |
991 |
|
10 |
768 |
816 |
Based on the data in Table 8.5, the sample mean and the standard deviation of differences can be calculated using formulas (F7.26) and (F7.27), respectively:
$$
\bar{d} = \frac{\sum d_i}{n} = \frac{1636}{10} = 163.6
$$
$$
s_d = \sqrt{\frac{\sum d_i^2 - \frac{1}{n}(\sum d_i)^2}{n-1}} = \sqrt{\frac{353140 - \frac{1}{10} \cdot 1636^2}{10-1}} = 97.5
$$
Now we can use hypothesis testing to verify the company’s statement.
Step 1.
The literature review shows that using the identical amount of alcohol causes a change in reaction time by 120 milliseconds. Consequently, the null hypothesis is:
$$
H_0: \mu_d = 120
$$
Considering the company’s statement, the alternative hypothesis is:
$$
H_a: \mu_d < 120
$$
Step 2.
Assuming the populations are normal and the sample size is less than 30, we will use the Student’s t-distribution for this one-tailed problem.
Step 3. For the given sample size:
$$
df = 10 - 1 = 9
$$
We conduct the test at 99% confidence, so:
$$
\alpha = 0.01
$$
From the t-distribution table (table A4, Appendix), the critical value for the left-tail rejection region is:
$$
t_{\text{critical}} = -2.821
$$
Step 4.
The t-score for the selected sample of pairs is:
$$
t = \frac{\bar{d} - \mu_d}{s_d / \sqrt{n}} = \frac{163.6 - 120}{97.5 / \sqrt{10}} = 1.414
$$
Step 5. Since the calculated t-value does not lie in the rejection region (fig. 8.8), we do not have enough evidence to reject the null hypothesis.

Figure 8.8. The rejection region and t-score evaluated for the sample given in example 8.9
Therefore, we conclude that within 99% confidence, the claim of the company is not reasonable. In other words, the change in reaction time of drivers consuming the company’s alcohol beverage is not less than 120 milliseconds.
Chapter 8 Summary
- The philosophy of hypothesis testing
- Five steps of hypothesis testing
- Decision rule
- Rejection region method
- p-value method
- Large-sample hypothesis testing for one proportion
- Hypothesis testing for the difference between two population means
- Large-sample hypothesis testing for two means
- Large-sample hypothesis testing for two proportions
- Small-sample hypothesis testing for two means
- Hypothesis testing for matched pairs
You can also access the presentation link just by clicking here: click
EXERCISES
Hypothesis. Philosophy of Hypothesis Testing
- (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) In hypothesis testing, the hypothesis tentatively assumed to be true is
a) The alternative hypothesis
b) The null hypothesis
c) Either the null or the alternative
d) All of the above answers are correct
e) None of the above answers are correct
- (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) In hypothesis testing if the null hypothesis is rejected,
a) No conclusions can be drawn from the test
b) The alternative hypothesis must also be rejected
c) The data must have been accumulated incorrectly
d) The sample size has been too small
e) None of the above answers are correct
- (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) The level of significance is the
a) Maximum allowable probability of Type II error
b) Maximum allowable probability of Type I error
c) Same as the confidence coefficient
d) Same as the p-value
e) None of the above answers are correct
- (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) The error of rejecting a true null hypothesis is
a) A Type I error
b) A Type II error
c) Can be either a or b, depending on the situation
d) Committed when not enough information is available
e) None of the above answers are correct
- (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) In hypothesis testing if the null hypothesis has been rejected when alternative hypothesis has been true,
a) A Type I error has been committed
b) A Type II error has been committed
c) Either a Type I or Type II error has been committed.
d) The correct decision has been made
e) None of the above answers are correct
Five Steps of Hypothesis Testing
(Introduction to Statistics, 2nd Ed., Test Bank, Anderson, D. R., Sweeney, D. J., Williams, T. A., 1991)
1. Which of the following hypotheses are being tested at a level of significance of \( \alpha \):
\[
H_0: \mu \ge 100
\]
\[
H_a: \mu < 100
\]
The null hypothesis will be rejected if the test statistic
a) \( z > z_{\alpha} \)
b) \( z < z_{\alpha} \)
c) \( z < -z_{\alpha} \)
d) \( z < 100 \)
e) None of the above answers are correct
(Introduction to Statistics, 2nd Ed., Test Bank, Anderson, D. R., Sweeney, D. J., Williams, T. A., 1991)
2. When the p-value is used for hypothesis testing, the null hypothesis is rejected if
a) \( \text{P-value} < \alpha \)
b) \( \alpha < \text{P-value} \)
c) \( \text{P-value} > \alpha \)
d) \( \text{P-value} = \alpha \)
e) None of the above answers are correct
3. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) Which of the following is not needed to be known in order to compute p-value?
a) Knowledge of whether the test is one-tailed or two-tailed
b) The value of the test statistic
c) The level of significance
d) All of the above are needed
e) None of the above answers are correct
4. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) If a hypothesis is not rejected at the 5% level it
a) Will also not be rejected at the 1% level
b) Will always be rejected at the 1% level
c) Will sometimes be rejected at the 1% level
d) Not enough information is given to answer this question
e) None of the above answers are correct
5. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) A two tailed test is performed at 95% confidence. The p-value is determined to be 0.09. The null hypothesis
a) Must be rejected
b) Should not be rejected
c) Both a and b could be correct, depending on the sample size
d) Has been designed incorrectly
e) None of the above answers are correct.
Large Sample Hypothesis Testing for one sample mean
1. A social agency wants to determine if the average income of single parents with 2 children is below the poverty level. The poverty level for single parents with 2 children is \$22,500 in the community where the social agency is located. The agency samples 600 single-parent families with 2 children. They find that the average income is \$21,600, with a standard deviation of \$8,000. Is there reason to believe that the average income of the population of single parents is below the poverty limit? (Test at \( \alpha = 0.05 \))
2. Administrators at a post-secondary institution expect that the average starting salary of graduates will be \$30,000. To test this hypothesis, the administrators survey a random sample of 16 students who graduated within the last year. The average starting salary was \$28,700 with a standard deviation of \$7,000. Is there reason to believe that the average starting salary of all graduates is \$30,000? (Test at \( \alpha = 0.05 \))
3. A cereal manufacturer claims that there are at least 5 raisins added for every ounce of cereal in a box. Thirty-six boxes of cereal are inspected, and it is found that the average number of raisins per ounce is 4.9 with a standard deviation of 1.2 raisins. Is the cereal manufacturer's claim correct? (Test at \( \alpha = 0.01 \))
4. The average time for a sample of 40 three-year-olds to solve a puzzle is 4.47 minutes with a standard deviation of 1.3 minutes. The puzzle manufacturer states that the average three-year-old can solve the puzzle in less than 6 minutes. Does the data contradict the toy manufacturer's claim? (Test at \( \alpha = 0.05 \))
5. A professor believes that the average age of students is 20.3. A sample of 25 students yields an average age of 21.8 years, with a standard deviation of 2.0 years. Is there any reason to doubt the professor's belief? (Test at \( \alpha = 0.01 \))
6. A college administrator believes that the average age of students is 23. The administrator takes a sample of 90 students and discovers that the average age of these students is 21, with a standard deviation of 1.2 years. Is the administrator's belief justified? (Test at \( \alpha = 0.05 \))
Large Sample Hypothesis Testing for one sample proportion
1. A company offers high school students a cram course that will help them get into Ivy League universities. The school charges students \$10,000 for this service. The company claims that 80% of their students are accepted to an Ivy League university. A skeptic wants to see if this claim is true. She samples 70 students who have paid for this service. Sixty-two of these students were accepted by Ivy League universities. Is the company’s claim correct? (Test at \( \alpha = 0.05 \))
2. The manufacturer of computer chips wants to know what proportion of chips is faulty. Typical industry standards are 5% faulty. A sample of 700 chips is inspected, and 53 are faulty. Is there any reason to believe that the proportion of faulty chips is different from the industry standard? (Test at \( \alpha = 0.01 \))
3. City council believes that over 30% of all stores are disobeying the by-laws concerning late night shopping. An inspector checks 50 stores and reports that 20 were disobeying the law. Is there reason to accept city council's belief? (Test at \( \alpha = 0.05 \))
4. The manufacturer of computer chips wants to know what proportion of chips is faulty. Typical industry standards are 5% faulty. A sample of 700 chips is inspected, and 53 are faulty. Is there any reason to believe that the proportion of faulty chips is different from the industry standard? (Test at \( \alpha = 0.01 \))
5. Normally 2% of newborn babies have a certain birth defect. In one year a sample of 2500 births resulted in 30 babies with the birth defect. Is there any reason to believe that the percentage of birth defects is lower than the normal value? (Test at \( \alpha = 0.05 \))
6. A past census indicated that 37% of married couples divorced within 10 years of getting married. A sample of 50 couples was taken. These couples had been married for 10 years. Fifteen of the couples were now divorced. Is there reason to believe that the proportion of divorces is going down? (Test at \( \alpha = 0.05 \))
Small Sample Hypothesis Testing for the mean
1. The diameters of 13 burial mounds of First Nations’ people in a remote area of southern Saskatchewan were measured. The sample mean and standard deviation of those measurements were 26.6 and 6.5 feet, respectively. Do these data substantiate the conjecture that the population mean diameter is larger than 21.0 feet? Test at \( \alpha = 0.01 \). (Assume the underlying population distribution is normal.)
2. The following are the recovery times for 12 patients using a new drug:
3.9 4.2 4.2 3.8 3.6 4.1 4.4 3.4 4.0 3.5 3.7 4.2
Prior to the use of the new drug, the mean recovery time was 3.6 days. Are the researchers justified in concluding that the new drug increases the recovery time? (Test at \( \alpha = 0.05 \))
3. An office supply company believes that a certain brand of bookcases are smaller than the 127.7 cm stated by the manufacturer. Five bookcases are measured and the average height is 104.18 cm. The standard deviation is 5.8 cm. Is there reason to believe that the bookcases are smaller than 127.7 cm? (Test at \( \alpha = 0.05 \))
4. According to an industry spokesperson, banks give 1.77% of their profits to charity. A survey of 8 banks yields the following percentages of income that were given to charity:
1.16 0.98 1.70 1.95 1.32 1.27 1.84 1.23
Is there any reason to believe the spokesperson's claim is incorrect? (Test at \( \alpha = 0.05 \))
5. A professor believes that the average age of students is 20.3. A sample of 25 students yields an average age of 21.8 years, with a standard deviation of 2.0 years. Is there any reason to doubt the professor's belief? (Test at \( \alpha = 0.01 \))
6. The level of mercury in fish in a certain lake is assumed to be 200 parts per billion. The following are mercury levels, in parts per billion, from a sample of fish taken from the lake:
170 230 180 195 210 190 190 200 210 190 180
Is the average mercury level for lake fish different from the assumed value of 200 parts per billion? (Test at \( \alpha = 0.05 \))
Comparison Analysis of two large samples using Hypothesis Testing. Means and Proportions (PA Samples)
7. Researchers want to determine if the same level of alcohol will be more likely to intoxicate women than men. They believe that this may be the case since women on average have a lower body weight than men, and so the same amount of alcohol may have a greater effect on those with a lower body weight. To test this hypothesis, the researchers, gather a sample of 30 men and a sample of 40 women. Each man and woman is then asked to drink the same amount of alcohol in a half hour period. After one hour the Blood Alcohol level (BAC) of each individual is measured. A higher BAC level is used by the researchers as their measure of intoxication. The following are the results of the experiment:
| Men | Women | |
| Sample size | 30 | 40 |
| Mean BAC | .072 | .092 |
| Standard deviation | .020 | .025 |
Are women more likely than men to be intoxicated if they drink the same amount of alcohol?
8. Health workers believe that if seniors practice balance exercises for 10 minutes a day they will be less likely to be hospitalized for falls. In order to test this hypothesis the health workers sampled seniors who had taken the exercise program and those who had not. The following are the number of seniors in each group that suffered a fall requiring hospitalization:
| No exercises | Exercises | |
| Number of seniors | 100 | 100 |
| Number of falls | 10 | 6 |
Do seniors who practice balance exercise have less chance of being hospitalized than seniors who do not practice balance exercises?
9. A researcher believes that the birth weight of babies of non-smokers will be greater than the birth weight of babies of smokers. Below are the results of her research:
| Non-Smokers | Smokers | |
| Sample size | 100 | 100 |
| Mean birth weight | 4.1 kg | 3.8 kg |
| Standard deviation | 0.3 kg | 0.2 kg |
Is the researcher's belief justified?
10. A transportation company wanted to compare the operating costs of gasoline versus propane powered trucks. 100 gasoline trucks and 100 propane trucks were put in service. The average operating cost for the gasoline trucks was 6.70 cents per kilometer, and for the propane trucks it was 6.54 cents per kilometer. The variance for the gasoline trucks was .36 and for the propane trucks it was 0.4. Is there sufficient evidence to indicate a difference in the mean operating costs of the two types of trucks? (Test at α = 0.05)
11. A drug company conducts a test comparing their new experimental insecticide with their standard brand. Under controlled conditions the standard brand kills 425 of a sample of 500 mosquitoes within one minute. The experimental spray kills 459 of 500 mosquitoes in the same time. Does this suggest that the experimental insecticide is significantly more effective than the standard brand? (Test at α= 0.01)
12. Five years ago 407 of a sample of 1000 voters opposed any form of rifle legislation. This year 575 out of a sample of 1500 voters opposed it. Is there any significant change in opposition? (Test at α = 0.05)
13. Two random samples of people in two different age groups (above 60 years old and below 40 years old) who live in Moose Jaw were taken to figure out about the average number of hours. The first sample of size 250 corresponds to the age group above 60 and 150 of them sleep less than 8 hours per night. The second sample of the same size 250 corresponds to the age group below 40 years old and 172 of them sleep less than 8 hours per night. Does these data demonstrate that the proportion of persons who sleep less than 8 hours per night is significantly lower for the age group above 60 than that for the age group below 40? Use α = 0.05 and report the p-value.
Comparison Analysis of two small samples using Hypothesis Testing. Means (Indigenous examples about water quality. Two communities)
1. Do physicians tend to give different numbers of prescriptions to men than to women for the same medical condition? In order to test this hypothesis, researchers went through medical records and created a sample of 8 men and 8 women who had identical medical conditions. The results of their research are summarized below:
| Men | Women | |
| Sample size | 8 | 8 |
| Mean number of prescriptions | 4.2 | 4.6 |
| Standard deviation | 0.4 | 0.4 |
Do physicians give different numbers of prescriptions to men than to women?
2. Medical researchers want to know if a new drug reduces the concentration of a virus in the bloodstream. The researchers randomly sample individuals who have contracted the virus, and measure the concentration of the virus before and after each individual has received the drug. The results are listed below:
| Concentration before | Concentration after |
| 92 | 84 |
| 37 | 26 |
| 81 | 77 |
| 41 | 15 |
| 62 | 57 |
| 50 | 46 |
Does the new drug reduce the concentration of the virus? (Test at α = 0.01)
3. In an initial study ten randomly selected potential consumers were asked to taste the product Brand X and to rate it on a scale of 1 (very poor) to 10 (outstanding). Another 10 potential customers were asked to rate the product Brand Y and rate it on the same scale. The following statistics were calculated:
| Brand X | Brand Y | |
| Mean | 8.6 | 6.8 |
| Standard deviation | 1.1 | 1.3 |
| Sample size | 10 | 10 |
Are the ratings significantly different for the two products? (Test at α = 0.01)
4. Physicians were concerned that a certain treatment would cause weight loss. They measured the weight in kilograms of patients before and after treatment. Did the patients lose weight? (Test at α = 0.05)
| Patient | Before | After |
| 1 | 92.6 | 88.0 |
| 2 | 67.7 | 64.3 |
| 3 | 74.7 | 59.8 |
| 4 | 73.0 | 64.0 |
| 5 | 93.5 | 78.4 |
| 6 | 70.2 | 59.6 |
| 7 | 74.5 | 63.3 |
| 8 | 53.9 | 48.9 |
| 9 | 83.0 | 74.0 |
| 10 | 47.3 | 38.0 |
| 11 | 92.4 | 85.7 |
| 12 | 65.4 | 55.1 |
5. A lawyer believes that northern residents of the province receive different lengths of prison sentences than southern residents. The lawyer takes a sample of 20 northerners and 20 southerners who have been convicted of the same crime. The results are in days of prison sentence, and are summarized below:
| northern residents | southern residents | |
| mean | 151.82 days | 146.71 days |
| standard deviation | 5.91 days | 8.42 days |
Assume the 2 sampled populations are normally distributed. Is the lawyer's belief justified? (Test at α = 0.01)
6. Below are data of monthly home electricity costs from household samples in two cities:
City 1: 32.9 24.4 23.8 32.1 38.9 35.6 27.4 44.8 37.5 40.9 17.6 40.3
City 2: 30.4 28.1 34.1 19.6 23.4 35.7 40.2 26.4
Is there any significant difference between average monthly costs in the two cities? (Test at α = 0.05)
7. BC apple growers wanted to determine the effect of an advertising campaign. Prior to the advertising campaign a sample of 14 grocery stores purchased an average of 7.39 tons of apples annually, with a standard deviation of 1.28 tons. After the advertising campaign a sample of 11 stores purchased an average of 9.12 tons of apples with a standard deviation of 1.06. Did the advertising campaign result in a significant increase in the sale of apples?
Matching pairs
1. Does poverty affect children’s ability to learn? Researchers tested this hypothesis by randomly sampling children between the ages of 8 and 10. The researchers then classified the children’s family as being either poor or not poor. They also went over student records and classified each student’s academic performance. There were 3 categories of performance: below the expected age level, at the expected age level, and above the expected age level. The following are the results of the analysis:
| Poor Family | Not Poor | |
| Below age level | 10 | 10 |
| At age level | 20 | 80 |
| Above age level | 10 | 30 |
Is there a relationship between poverty and children’s ability to learn?
2. Administrators at a technical institute believe that students will apply to programs in the following proportions:
Health 25%
Computer 15%
Shop 40%
Other 20%
A sample of 200 student applications is examined, and the students are found to apply for the following programs:
Health 55
Computer 25
Shop 90
Other 30
Sample size 200
Are the administrators correct in their assumptions regarding the proportion of admissions in each program?
3. The marketing department of a toothpaste company wants to see if customers have any preferences between 6 different brands of toothpaste. The observed results are given below. Do the data present sufficient evidence that some brands are preferred over others? (Test at α = 0.01)
| Brand | A | B | C | D | E | F |
| Frequency | 84 | 110 | 146 | 152 | 61 | 47 |
4. At one university new students usually enroll in arts, sciences and the professions in the following proportions:
Arts 40% Sciences 25% Professions 35%
This year a sample of new students was found to have 120 individuals enrolled in arts, 60 in sciences and 110 in professions. Is there any reason to believe that the proportions are different this year than in the past? (Test at α = 0.05)
5. The population of a certain American city is 45% white, 20% Hispanic, 20% black, and 15% other ethnic groups. A sample of 400 people who used public hospitals revealed that 160 were white, 90 were Hispanic, 110 were black, and the rest were from other ethnic groups. Is there reason to believe that the use of public hospitals by ethnic group is in proportion to the number of people in each ethnic group? (Test at α = 0.05)
6. In the last provincial election candidates in a certain riding received the following popular votes:
| NDP | 4511 |
| Sask. Party | 6033 |
| Liberals | 1955 |
| Other | 844 |
A poll of voter preferences is now made in the same riding, with the following results:
| NDP | 67 |
| Sask. Party | 34 |
| Liberals | 28 |
| Other | 3 |
Is there any reason to believe that voter preferences have changed? (Test at α = 0.05)
7. A large city carried out a year long study of how people commuted to work. The proportion of all commuters using different methods of transit are listed below:
| Bus | Train | Car | Other | |
| Proportion | .25 | .5 | .50 | .10 |
The city government then launched a campaign to encourage commuters to switch from cars to public transport. A total of 80 commuters were sampled after the campaign, and their current means of transportation to work was recorded. The results are as follows:
| Bus | Train | Car | Other | |
| Number | 26 | 15 | 32 | 7 |
Has the campaign influenced how people commute to work? (Test at α = 0.01)
Hypothesis testing is a fundamental concept in statistics that involves making decisions based on data from samples. It is a method used to evaluate two mutually exclusive statements about a population parameter. The two statements are the null hypothesis (H0) and the alternative hypothesis (Ha).
Null hypothesis (H0): The null hypothesis represents the default assumption or the status quo. It is the hypothesis that there is no significant difference or effect. In essence, it assumes that any observed difference is due to random variation.
Alternative hypothesis (Ha): The alternative hypothesis is what the researcher is trying to prove. It suggests that there is a significant difference, effect, or relationship in the population.
The p-value method is another approach in hypothesis testing that provides a way to make decisions about the null hypothesis based on sample data.
Matched pairs, also known as paired samples or repeated measures, are research designs in statistics that involve the comparison of two sets of measurements taken from the same individual or subject under different conditions. In matched pairs analysis, each individual in the sample is measured twice, once under each condition or treatment, and the measurements are paired together for analysis. This design is commonly used in research studies to control for individual differences and increase the precision of estimates by reducing variability.
The decision rule in hypothesis testing is a guideline for determining whether to reject the null hypothesis based on the evidence provided by the data. It helps researchers make a decision about the null hypothesis in favour of the alternative hypothesis. The decision rule is typically based on the significance level (α) chosen at the beginning of the hypothesis testing process.
The rejection region method is a common approach in hypothesis testing and is used to make decisions about the null hypothesis based on sample data.