Chapter 5. Normal Distributions

Chapter Objectives

In this chapter, readers will learn to do the following:

• Define the normal distribution
• Identify the normal distribution curve
• Determine probabilities using the Standard Normal Distribution table
• Solve the backward problems using the Standard Normal Distribution table
• Define criteria of using the normal distribution approximation for solving problems with binomial variables
• Solve problems on binomial variables using the normal distribution approximation

In previous chapters, we analyzed the distribution of discrete variables. However, when solving statistical problems, we often need to work with continuous variables. In this chapter, you will learn about the continuous probability distribution for a real-valued random variable. In statistics, this distribution is called a normal distribution. In some resources, the normal distribution is also presented as a Gauss or Laplace-Gauss distribution.

Carl Friedrich Gauss (1777–1855) was a German mathematician and physicist who contributed significantly to many areas of mathematics and science.

Pierre-Simon, Marquis de Laplace (1749–1827) was a French scholar known for his essential works in engineering, science, and mathematics.

The general form of the probability density function of the normal distribution is

[latex]f\left(x\right)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}\qquad\text{(F5.1)}[/latex]

where σ is the standard deviation, and μ is the mean of the normal variable.

Several characteristics make the normal distribution very important for statisticians. It is an excellent approximation to many populations, such as the height of trees in the forest or the weight of grizzly bears. This textbook will mainly work with cumulative normal probability tables (table A3, Appendix) and normal distribution curves constructed using this function.

5.1. Normal Distribution Curve

In solving problems relating to normally distributed variables, we will often use the normal distribution graph called the normal distribution curve. As one can see from the formula (F5.1), the shape of the normal distribution is determined entirely by the values of m and s. A normal distribution curve is symmetrical and mount-shaped, so this graph is also known as a bell curve in statistics (fig. 5.1).

image

Figure 5.1. Bell curve

The area under the normal distribution curve represents the probability of being selected within a certain interval; the intervals can be closed or open. Based on the definition of the probability, we can state that the total area below the normal distribution curve (we might use the term “normal curve,” for simplicity) equals 1. Most observations in the normal distribution are close to the mean, with gradually fewer observations further away. Since we deal with a continuous variable, the section under every single point of the curve can be considered as a bar that we observed in previous chapters for discrete variables. Each such segment, whose height is a distance between the point and the horizontal axis, has the width of zero. The area of this segment equals 0. Therefore, the probability of a single value is 0.

We find it reasonable to list some common properties of all normal curves:

1. The total area below the normal curve equals 1.
2. The area below a single point is 0.
3. The areas to the right and left from the mean μ equal 0.5.

In chapter 2, we stated some properties of mount-shaped distribution in the form of the empirical rule:

  • Approximately 68% of the data lie within the interval (μ-σ, μ+σ)
  • Approximately 95% of the data lie within the interval (μ-2σ, μ+2σ)
  • Approximately 99.7% of the data lie within the interval (μ-3σ, μ+3σ)

Considering the normal distribution curve's mount shape, the rules also apply to the normally distributed variables. Moreover, later, we will provide a more precise evaluation of these statements. For now, let us consider an example of normally distributed variables and solve it using the empirical rule.

Example 5.1

The lengths of musical clips posted on an Internet website are normally distributed at the mean of 4 minutes and standard deviation of 1 minute. Determine the following:

(a) The probability that the length of a randomly selected clip is shorter than 2 minutes.

(b) The probability that the length of a randomly selected clip is longer than 5 minutes.

(c) The probability that a randomly selected clip lasts from 1 to 7 minutes.

Solution:

It is helpful to solve this question referring to the normal distribution curve (fig. 5.2).

image

Figure 5.2. Normal distribution curve for example 5.1

(a)  According to the empirical rule, the length of 95% of clips varies from 4-2(1)=2 to 4+2(1)=6 minutes. Hence, 100% - 95% = 5% of clips last either less than 2 minutes or longer than 6 minutes. Since the curve is symmetrical with respect to the mean, half of those clips, 5%/2 = 2.5%, last less than 2 minutes. Therefore, the probability that the length of a randomly selected clip is shorter than 2 minutes approximately equals 0.025.

(b)  According to the empirical rule, the length of 68% of clips varies from 4-1(1)=3 to 4+1(1)=5 minutes. Hence, 100% - 68% = 32% of clips last either less than 3 minutes or longer than 5 minutes. Since the curve is symmetrical with respect to the mean, half of those clips, 32%/2 = 16%, last longer than 5 minutes. Therefore, the probability that the length of a randomly selected clip is shorter than 2 minutes approximately equals 0.16.

(c). According to the empirical rule, the length of 99.7% of clips varies from 4-3(1)=1 to 4+3(1)=7 minutes. Therefore, the probability that the length of a randomly selected clip is between 1 and 7 minutes equals 0.997.

5.2. Z-Score and Standard Normal Distribution Tables

Although problems usually involve distributed variables that can be solved using the normal distribution curves, it is impractical to draw a special case for each data set. It would be more convenient to standardize the curve to make it useful for all normally distributed data sets. As we can see from the solutions of examples 2.11 and 5.1, the probabilities were evaluated based on the position of a variable on the horizontal axis of the normal curve. These positions were measured in terms of standard deviation. So, it would be logical to rescale the horizontal axis of the normal curve by replacing the unit of the actual variable with the standard deviation. This is very similar to changing units in time measurements. For instance, the duration of a multi-episode movie is 150 minutes, and each episode lasts 50 minutes. To express the movie's length, we can say, "This movie lasts three episodes," considering that 150 minutes/50 minutes = 3.

In statistics, the standardized normal variable is denoted by z and called the z-score, determined as

[latex]z=\frac{x-\mu}{\sigma}\qquad\text{(F5.2)}[/latex]

where x is a normal random variable, μ is the mean of the normal distribution, and σ is its standard deviation.

Sometimes, we might need to express x using the mean and standard deviation.

[latex]x=\mu+z\sigma\qquad\text{(F5.3)}[/latex]

By inspecting the formula (F5.2), we can state that

(1) The z-score of the mean equals zero
(2) If a random variable exceeds the mean, its z-score is positive
(3) If a random normal variable is less than the mean, its z-score is negative

Based on these statements, we can draw the standard normal distribution curve (fig. 5.3).

image

Figure 5.3. The standard normal distribution curve

In the previous chapter, we used cumulative tables to solve problems about binomial variables. Using the formula (F5.1), similar tables have been constructed for normal variables (table A3, Appendix). The first column of table A3 indicates the first two digits of the z-score, and the first row gives the third digit. The number at the intersection of a specific column and row represents the area to the left from the corresponding value of the z-score.

Since the standard normal curve is a particular case of normal curves, all features of normal curves are true for the standard normal curve too.

1. The total area below the normal curve equals 1.
2. The area below a single z-score is 0.
3. The areas to the right and left from the centre 0 equal 0.5.

The second property of the standard normal curve implies that

• \(P(Z\le a)=P(Z<a)\)

• \(P(Z\ge a)=P(Z>a)\)

• \(P(a\le Z\le b)=P(a<Z\le b)=P(a\le Z<b)=P(a<Z<b)\)

Let us provide a formal proof for the first statement. According to the addition rule of probability, \(P(Z\le a)=P(Z<a)+P(Z=a)\). But \(P(Z=a)=0\), based on the second property of the normal curve. Therefore, \(P(Z\le a)=P(Z<a)\).

Example 5.2

Use table A3 (Appendix) to determine the following probabilities:

(a) [latex]\displaystyle P(Z\le -1.83)[/latex]

(b) [latex]\displaystyle P(Z\ge 1.62)[/latex]

(c) [latex]\displaystyle P(0.74\le Z\le 2.83)[/latex]

Solution:

(a)  Graphically, the probability [latex]\displaystyle P(Z\le -1.83)[/latex] corresponds to the area below the standard normal distribution curve to the left from z = – 1.83 (fig. 5.4).

image

Figure 5.4. Standard normal distribution curve for example 5.2(a)

This area can be found at the intersection of the row “–1.8” and the column “0.03” (table 5.1).

 

Table 5.1. The fragment of the standard normal distribution table for example 5.2(a)

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

-1.9

.0287

.0281

.0274

.0268

.0262

.0256

.0250

.0244

.0239

.0233

-1.8

.0359

.0351

.0344

.0336

.0329

.0322

.0314

.0307

.0301

.0294

-1.7

.0446

.0436

.0427

.0418

.0409

.0401

.0392

.0384

.0375

.0367

-1.6

.0548

.0537

.0526

.0516

.0505

.0495

.0485

.0475

.0465

.0455

-1.5

.0668

.0655

.0643

.0630

.0618

.0606

.0594

.0582

.0571

.0559

Therefore, [latex]\displaystyle P(Z\le -1.83)=0.0336[/latex].

(b)  raphically, the probability [latex]\displaystyle P(Z\ge 1.62)[/latex] corresponds to the area below the standard normal distribution curve to the right from z = – 1.83 (fig. 5.5).

imageFigure 5.5. Standard normal distribution curve for example 5.2(b)

Table 5.2 provides the area below the curve to the left from z = 1.62, which equals 0.9474.

Table 5.2. The fragment of the standard normal distribution table for example 5.2(b)

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

1.5

.9332

.9345

.9357

.9370

.9382

.9394

.9406

.9418

.9429

.9441

1.6

.9452

.9463

.9474

.9484

.9495

.9505

.9515

.9525

.9535

.9545

1.7

.9554

.9564

.9573

.9582

.9591

.9599

.9608

.9616

.9625

.9633

1.8

.9641

.9649

.9656

.9664

.9671

.9678

.9686

.9693

.9699

.9706

1.9

.9713

.9719

.9726

.9732

.9738

.9744

.9750

.9756

.9761

.9767

 

Since the total area below the curve is 1, the area to the right from z = 1.62 can be determined as 1 – 0.9474 = 0.0526. Therefore,

[latex]\displaystyle P(Z\ge 1.62)=0.0526[/latex]

.

(c)  Graphically, the probability [latex]\displaystyle P(0.74\le Z\le 2.83)[/latex] corresponds to the area below the standard normal curve between z = 0.74 and z = 2.83 (fig. 5.6).

image

Figure 5.6. Standard normal distribution curve for example 5.2(c)

Using table A3 (Appendix), one can find the areas to the left from z = 0.74 and z = 2.83, which are 0.7704 and 0.9977, respectively. The area between these two values of z-scores is 0.9977 – 0.7704 = 0.2273. Therefore,

[latex]\displaystyle P(0.74\le Z\le 2.83)=0.2273[/latex]

.

5.3 Calculations of Probabilities and Outcomes of Normal Distribution

In previous chapters, we noted that the probability, proportion, and relative frequency have the same nature. Based on this statement, we can use the standard normal distribution curve and table A3 (Appendix) to evaluate the required distribution proportions if the variable is normal. Moreover, we can determine the borders (z-scores) of proportions using the same tools. Finding a z-score for a given probability or proportion is called a backward problem.

Example 5.3

Determine a z-score that

(a)  leaves an area of 0.9678 on the left

(b)  eaves an area of 0.9612 on the right

Solution:

As noted above, we will use table A3 (Appendix) to solve the following problem:

(a)  First, we look for "0.9678" in table 5.3. After locating this number, we can determine the z-score, which leaves this area on the left.  The first two digits of this z-score come from the first column, “1.8,” and the second digit after the decimal point comes from the first row, “0.05.” Therefore, the z-score that leaves the area of 0.9678 on the left equals 1.85.

Table 5.3. The fragment of the standard normal distribution table for example 5.3(a)

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

1.5

.9332

.9345

.9357

.9370

.9382

.9394

.9406

.9418

.9429

.9441

1.6

.9452

.9463

.9474

.9484

.9495

.9505

.9515

.9525

.9535

.9545

1.7

.9554

.9564

.9573

.9582

.9591

.9599

.9608

.9616

.9625

.9633

1.8

.9641

.9649

.9656

.9664

.9671

.9678

.9686

.9693

.9699

.9706

1.9

.9713

.9719

.9726

.9732

.9738

.9744

.9750

.9756

.9761

.9767

(b)  Considering that table A3 (Appendix) provides an area to the left of a z-score, first we find the area on the left of this z-score: 1 – 0.9612 = 0.0388. Now, this problem can be solved like in part (a). In other words, first we must locate the area “0.0388” in the body of table 3. However, the table does not contain this number. So, in this case, we have to look for a number closest to 0.0388. Note that two numbers are equally close to it, 0.0392 and 0.0384 (table 5.4).

Table 5.4. The fragment of the standard normal distribution table for example 5.3(b)

z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

-1.9

.0287

.0281

.0274

.0268

.0262

.0256

.0250

.0244

.0239

.0233

-1.8

.0359

.0351

.0344

.0336

.0329

.0322

.0314

.0307

.0301

.0294

-1.7

.0446

.0436

.0427

.0418

.0409

.0401

.0392

.0384

.0375

.0367

-1.6

.0548

.0537

.0526

.0516

.0505

.0495

.0485

.0475

.0465

.0455

-1.5

.0668

.0655

.0643

.0630

.0618

.0606

.0594

.0582

.0571

.0559

Corresponding z-scores are – 1.76 and – 1.77, respectively. Then, the z-score leaving 0.9612 on the right (equivalently, leaving 0.0388 on the left) can be determined as the average of these numbers:

[latex]\displaystyle z=\frac{-1.76+(-1.77)}{2}=-1.765[/latex]

To solve the problems on normally distributed values, we will first standardize the variable and then evaluate the probabilities and proportions using the strategies discussed.

Example 5.4

Jane’s email app records daily the time (in minutes) she spends composing messages. The analysis shows that this variable is normally distributed with a mean of 40 minutes and a standard deviation of 10 minutes.

(a)  How many days does Jane spend less than 50 minutes composing emails?

(b)  What is the probability that Jane spends more than 20 minutes daily composing emails?

(c)  What is the probability that Jane spends 20–50 minutes daily composing emails?

(d)  Determine the proportion of days within two standard deviations of the mean time.

Solution:

(a)  First, we have to find the z-score corresponding to 50 minutes.

[latex]\displaystyle z=\frac{x-\mu}{\sigma}=\frac{50-40}{10}=1[/latex]

Now we can evaluate the proportion of days within which Jane spends less than 50 minutes composing email messages, using table A3 (Appendix).

[latex]\displaystyle P(X<50)=P(Z<1)=0.8413[/latex]

(b) \(P(X>20)=P\left(Z>\frac{20-40}{10}\right)=P(Z>-2)=1-P(Z<-2)=1-0.0228=0.9772\)

(c) \(P(20<X<50)=P\left(\frac{20-40}{10}<Z<\frac{50-40}{10}\right)=P(-2<Z<1)=P(Z<1)-P(Z<-2)=0.8413-0.0228=0.8185\)

 

(d)  The interval within two standard deviations of the mean can be determined as [latex]\displaystyle \mu-2\sigma \le X \le \mu+2\sigma[/latex] Using the given values of the mean and standard deviation, we can evaluate the corresponding time interval as follows:

[latex]\displaystyle 40-2\cdot10 \le X \le 40+2\cdot10[/latex]

or

[latex]\displaystyle 20 \le X \le 60[/latex]

Hence,

    [latex]\displaystyle \begin{aligned} P(20\le X\le60) &= P\!\left(\frac{20-40}{10}\le Z\le\frac{60-40}{10}\right) \\ &= P(-2\le Z\le2) \\ &= 0.9772-0.0228 \\ &= 0.9544 \end{aligned}[/latex]

Note:

  • In part (d), we can use either ≤ or < symbols since the probability of one single value equals zero.
  • Part (d) could be solved without using the mean and standard deviation. (We leave it to readers to explain why.)

Example 5.5

Organizers of a musical competition decided that the top 80% of participants would qualify for the next stage. A participant can receive a maximum of 12 scores from each of the six jury members. With some accuracy, participants' scores were distributed normally, with a mean of 64 and a variance of 16. Determine the lowest passing score for the next stage of the competition.

Solution:

Using the statistics terminology, we can rephrase the provided information and state that the proportion of participants passing to the next stage of the competition is 80% = 0.8. So, referring to figure 5.7, we have to determine the score (denoted by XC), which leaves an area of 0.8 on the right (higher scores). In other words, we have to solve a backward problem similar to example 5.3.

image

Figure 5.7. Standard normal distribution curve for example 5.5

First, we can refer to table A3 (Appendix) to determine the cut-off z-score, which leaves 0.8 on the right, hence 1 – 0.8 = 0.2 on the left:

zc=-0.84 . Using the formula (F5.3), and considering that [latex]\displaystyle \sigma=\sqrt{\sigma^2}=\sqrt{16}=4[/latex] , we evaluate the cut-off score as follows:

[latex]\displaystyle x_C=\mu+z_C\sigma=64+(-0.84)\cdot4=60.64\approx61[/latex]

Therefore, we can conclude that participants with scores higher than 61 will continue the competition.

Example 5.5.1

Refer to example 5.5 and determine within what interval the middle scores of 80% lie.

Solution:

The 80% typical performances cut-off 20% of performance scores, half of which, or 10%, are below, and another half, or 10%, are above the cut-off scores (fig. 5.7.1).

image

Figure 5.7. 1. Standard normal distribution curve for example 5.5.1

The z-scores, which cut off 0.1 from the left and from the right of the standard normal distribution, found from the normal distribution table (table A3, Appendix) are – 1.28 and 1.28, respectively. Using the formula (F5.3) and the provided mean and standard deviation, we can determine the cut-off scores as follows:

[latex]\displaystyle x_{10\%}=\mu+z_{0.1}\sigma=64+(-1.28)\cdot4=58.88\approx59[/latex]

[latex]\displaystyle x_{90\%}=\mu+z_{0.9}\sigma=64+(1.28)\cdot4=68.96\approx69[/latex]

Therefore, performances scored from 59 to 69, are considered as typical in this competition.

 

5.4 The Normal Approximation of the Binomial Probability Distribution

In chapter 4, we discussed the distribution of discrete variables. As you can see, the probabilities of discrete variables can be evaluated using formulae, tables, and various computing software. Sometimes the calculation process becomes very time-consuming. The tables for binomial distribution are available only for specific values of sample size n and probability of success p. In contrast, evaluating normal probabilities is much easier using cumulative probability tables. Although normal probability is a feature of continuous variables, normal approximation can solve problems with discrete variables under some conditions and with some accuracy. This section will explain how the normal approximation can be applied to the binomial probability distribution.

Let us consider the probability distribution of a binomial variable X for various trials and success probabilities.

imageimage

imageimage

image

Figure 5.8. The binomial probability distribution for various sample sizes (n) and success probabilities (p)

One can see that with the increase of the product np and n(1-p), the graphs become more and more similar to a bell shape, typical for the normal probability distribution. Based on advanced statistics theory (which is beyond the scope of this book), it can be shown that the normal distribution approximation can be applied to binomial distribution upon the following conditions:

[latex]\displaystyle np>5 \text{ and } nq>5 \qquad \text{(F5.4)}[/latex]

where n is the number of trials, p is the probability of success, and q = 1-p is the probability of failure.

We can determine the mean and standard deviation of the binomial variable using the formulae [latex]\displaystyle \mu=np[/latex] and [latex]\displaystyle \sigma=\sqrt{npq}[/latex] , respectively, and then solve the problems using the normal probability distribution approximation if the binomial data set meets the criteria (F5.4).

One can show that the criteria (F5.4) are equivalent to the following statement:

If the interval [p-3σ, p+3σ] lies within the interval [0, 1] for the given binomial data, the normal approximation can be applied. The proof of this statement can be found in more advanced books.

5.5 Continuity Correction Principle

The area below the normal curve is not exactly equal to the corresponding areas of the bars of the binomial histogram. To enhance this approximation, we need to adjust our calculations. However, it has to be noted that we can trust these results with some accuracy. Estimation of this accuracy is beyond the scope of this book. In statistics, the procedure for improving this accuracy is called continuity correction. By convention, the following principles have been developed for the continuity correction for the normal approximation of binomial variables.

Binomial distribution

Normal approximation with continuity correction

P(X<x)

P(X<x-0.5)

P(X≤x)

P(X<x+0.5)

P(X>x)

P(X>x+0.5)

P(X≥x)

P(X>x-0.5)

It can be shown that these principles are valid if the conditions defined by (F5.4) are satisfied. The following example can help us understand the main idea of continuity correction.

Example 5.6

Consider a binomial data set consisting of 15 trials with the probability of success p = 0.6. For this data set, [latex]\displaystyle np=15\cdot0.6=9>5[/latex] and [latex]\displaystyle nq=n(1-p)=15\cdot(1-0.6)=6>5[/latex] . Thus, we can apply the normal approximation to evaluate probabilities. The histogram of the binomial probability and the corresponding normal curve are presented in figure 5.9.

image

(a) \(P_{\text{Binomial}}(X<7)=0.0951;\; P_{\text{Normal}}(X<6.5)=0.0934\)

image

(b) \(P_{\text{Binomial}}(X\le7)=0.2131;\; P_{\text{Normal}}(X<7.5)=0.0934\)

Figure 5.9. Normal approximation for binomial data set

Let us try to evaluate the exact probability [latex]\displaystyle P_{\text{Binomial}}(X<7)[/latex] and its normal approximation. The exact value of the probability [latex]\displaystyle P_{\text{Binomial}}(X<7)[/latex] graphically is equal to the sum of the areas of the bars corresponding to 0, 1, 2, 3, 4, 5, and 6 (note that the bars 0, 1, and 2 are not visible in the figure due to the small neglectable values of the corresponding probabilities). The exact value of this probability is calculated based on the use of the formula (F4.4).

[latex]\begin{aligned} P_{Binomial}(X<7) \\ &= P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)+P(X=5) \\ &\quad + P(X=6)=0.0951 \end{aligned}[/latex]

To evaluate the probability [latex]\displaystyle P_{\text{Normal}}(X<7)[/latex], first we need to determine the mean and standard deviation.

[latex]\displaystyle \mu=np=15\cdot0.6=9 \\ \sigma=\sqrt{npq}=\sqrt{15\cdot0.6\cdot0.4}=1.897[/latex]

Now, we can evaluate the probability [latex]\displaystyle P_{\text{Normal}}(X<7)[/latex] using table A3 (Appendix).

[latex]\displaystyle P_{\text{Normal}}(X<7)=P_{\text{Normal}}\!\left(Z<\frac{7-9}{1.897}\right)=P_{\text{Normal}}(Z<-1.05)=0.1469[/latex]

The area below the normal curve, restricted by 7—i.e., [latex]\displaystyle P_{\text{Normal}}(X<7)=0.1469[/latex]—includes some extra region (the dashed area in fig. 5.9). To exclude this area, we need to evaluate the normal approximation as [latex]\displaystyle P_{\text{Normal}}(X<6.5)[/latex] . Using table A3 (Appendix), we can find [latex]\displaystyle P_{\text{Normal}}(X<6.5)[/latex] as follows:

[latex]\displaystyle P_{\text{Normal}}(X<6.5)=P_{\text{Normal}}\!\left(Z<\frac{6.5-9}{1.897}\right)=P_{\text{Normal}}(Z<-1.32)=0.0934[/latex]

Comparing the estimations [latex]\displaystyle P_{\text{Normal}}(X<7)=0.1469[/latex] and [latex]\displaystyle P_{\text{Normal}}(X<6.5)=0.0934[/latex] with the exact value [latex]\displaystyle P_{\text{Binomial}}(X<7)=0.0951[/latex] we can conclude that the continuity correction significantly improves the accuracy of the normal approximation.

Now, let us evaluate the [latex]\displaystyle P_{\text{Binomial}}(X\le 7)[/latex] using the normal approximation. Graphically, this probability is equal to the sum of areas of bars 0, 1, 2, 3, 4, 5, 6, and 7. The value of the probability calculated using the formula (F4.4) equals

[latex]\displaystyle \begin{aligned} P_{\text{Binomial}}(X\le7) &= P(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)+P(X=5) \\ &\quad + P(X=6)+P(X=7)=0.2131 \end{aligned}[/latex]

The corresponding region below the normal curve includes the dashed area. Using the continuity correction, we obtain

[latex]\displaystyle P_{\text{Normal}}(X<7.5)=P\!\left(Z<\frac{7.5-9}{1.897}\right)=P(Z<-0.79)=0.2148[/latex]

instead of [latex]\displaystyle P_{\text{Normal}}(X<7)=0.1469[/latex] , which is closer to [latex]\displaystyle P_{\text{Binomial}}(X\le7)=0.2131[/latex]

.

Example 5.7

In example 4.2, we evaluated the probabilities of hitting the target, given that the probability of success is p = 0.6. Assume that in this example, the same hunter, Joseph, tries to hit the target 15 times (number of trials).

(a)  Use the cumulative binomial probability table (table A1, Appendix) to determine whether Joseph hits the target 9, 10, 11, or 12 times.

(b)  Check if this problem can be solved using the normal probability approximation. If so, determine the probability that Joseph hits the target 9, 10, 11, or 12 times.

(c)  Compare the answers for parts (a) and (b).

(d)  Use the cumulative binomial probability table (table A1, Appendix) to determine the probability that Joseph hits the target less than 10 times.

(e)  If this problem can be solved using the normal probability approximation, use this method to determine the probability that Joseph hits less than 10 times.

(f)  Compare the answers for parts (d) and (e).

Solution:

  (a)  We will refer to table A1 (Appendix) to determine the probability for n = 15, p = 0.6:

[latex]\displaystyle P(9,10,11\ \text{or}\ 12)=P(X\le12)-P(X\le8)=0.9729-0.3902=0.5827[/latex]

  (b)  For given p = 0.6, q = 1 – 0.6 = 0.4. Then [latex]\displaystyle np=15\cdot0.6=9>5[/latex] and [latex]\displaystyle                                             nq=15\cdot0.4=6>5[/latex]

Therefore, the given binomial data set meets the criteria of the normal approximation. Figure 5.9, which presents the probability distribution for n = 15, p = 0.6, and the normal curve, also shows that, with some accuracy, the normal distribution approximation can be used to solve this problem. The mean and the standard deviation can be evaluated as follows:

[latex]\displaystyle \mu=np=15\cdot0.6=9 \\ \sigma=\sqrt{npq}=\sqrt{15\cdot0.6\cdot0.4}=1.897[/latex]

Now, we are ready to evaluate the probability using the normal approximation and the continuity correction principle.

[latex]\displaystyle \begin{aligned} P(X=9,10,11\ \text{or}\ 12) &= P(8.5\le X\le12.5) = P(X\le12.5)-P(X\le8.5) \\ &\approx P\!\left(Z\le\frac{12.5-9}{1.897}\right) - P\!\left(Z\le\frac{8.5-9}{1.897}\right) = P(Z\le1.85)-P(Z\le-0.26) \\ &= 0.9678-0.3974=0.5704 \end{aligned}[/latex]

  (c)  Comparing the answers to parts (a) and (b), we can state that the results are close enough.

(d)  Using table A1 (Appendix),

[latex]\displaystyle P(X<10)=P(X\le9)=0.5968[/latex]

  (e)  Since the criteria (F5.4) are satisfied, we can use the normal approximation to evaluate the probability.

[latex]\displaystyle P(X<10)\approx P\!\left(Z\le\frac{9.5-9}{1.897}\right)=P(Z\le0.26)=0.6026[/latex]

You can also access the presentation links of the lecture just by clicking here: click

 

EXERCISES

 

Normal Distribution Curve
1. Why do statisticians prefer to work with the standard normal distribution rather than the normal distribution?

2. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) Which of the following is not a characteristic of the normal probability distribution?
a) Symmetry
b) Total area under the curve is always equal to 1.
c) 99.72% of the time, the random variable assumes a value within plus or minus 1 standard deviation of its mean
d) The mean is equal to the median, which is also equal to the mode
e) None of the answers is correct

3. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) Larger values of the standard deviation results in a normal curve that is

a) Shifted to the right
b) Shifted to the left
c) Narrower and more peaked
d) Wider and flatter
e) None of the above answers are correct

4. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) For a normal distribution, a negative value of z indicates

a) a mistake made in the computations, because z is always positive
b) the area corresponding to the z is negative
c) the z is to the left of the mean
d) the z is to the right of the mean
e) none of the above answers are correct.

5. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) A normal probability distribution

a) Is a continuous probability distribution.
b) Is a discrete probability distribution.
c) Can be either discrete or continuous
d) Must have a standard deviation of 1
e) None of the above answers are correct.

Probability and Area (Examples)
1. Calculate the area under the normal curve that lies:

a) above \( z = 2 \)

b) less than \( z = -1.5 \)

c) between \( z = \pm 2.23 \)

d) outside \( z = \pm 1.85 \)

e) less than \( z = 1.44 \)

2. For the normal curve, what is the probability that a value lies:

(a) below z=1.07
(b) above z=0.29
(c) between z= -1.91 and z = 0.45

3. What proportion of the area under the normal curve lies:

(a) between \( \pm 2.5 \) standard deviations from the mean

(b) to the right of \( z = 2 \)

(c) to the left of \( z = -1.5 \)

(d) outside of \( \pm 1.7 \) standard deviations from the mean

4. What proportion of the area under the normal curve lies:

(a) outside of 2.1 standard deviations from the mean
(b) to the right of z = -2
(c) to the left of z = 1.3
(d) between z=-1.2 and z = .66

5. If the population weights have a mound-shaped distribution, what proportion of scores are found between the following z scores:

 

a) between \( z = 1 \) and \( z = 2 \)

b) between \( z = 2 \) and \( z = 3 \)

c) between \( z = -3 \) and \( z = +1 \)

d) between \( z = -2 \) and \( z = +1 \)

e) greater than \( z = 2 \)

f) less than \( z = -3 \)

g) outside \( z = \pm 2 \)

h) outside \( z = \pm 3 \)

Z-score and Standard Normal Distribution

 

1. During the last provincial election, 20% of eligible voters did not turn out to vote. If 400 people had been randomly sampled, what is the probability that more than 75 individuals in the sample would not have voted?

Ans: .7123

\[
\mu = 80
\]

\[
\sigma = 8
\]

X =75.5 z=-.5625
P = .2123+.5 = .7123

2. A firm sends their employees to a computer programming course. The mean and standard deviation of the final exam scores are 80 and 5. The distribution of scores is normally distributed.

a) A trainee scored 70 on the test. Compute the z-score for this trainee.
b) What proportion of test takers scored better than 70?
c) The firm wants to reward all individuals who have scores in the top 10%. What exam score will separate the top 10% of scores from the bottom 90%?

3. Find the value of z such that:

(a) 16% of the area under the normal curve is to the left of z
(b) 78% of the area is to the left of z
(c) 52% of the area is to the right of z
(d) 1.2% of the area is to the left of z
(e) 13% of the area is to the right of z

4. A normally distributed population has a mean of 24 years and a standard deviation of 2 years. What is the probability that any one person will be:

a. over 24 years of age
b. between 20 and 28 years old
c. between 20 and 26 years old
d. younger than 22
e. older than 25

5. Most four-year automobile leases allow up to 60,000 km. If the lessee goes beyond this amount, a penalty is added to the lease cost. Suppose the distribution of kilometers driven on four-year leases follow the normal distribution. The mean is 52,000 km and the standard deviation is 5,000 km.

a) What percent of leases will result in a penalty because of excess mileage?
b) If the automobile company wanted to change the terms of the lease so that 25 percent of the leases went over the limit, where should the new upper limit be set?

 

6. A population is normally distributed with a mean of 8.0 and a standard deviation of 1.6. Find the value of z for:

(a) X = 9.6
(b) X = 8.0
(c) X = 3.3
(d) X = 12.1
(e) X = 7.84

7. What is the distinction between a normal distribution and the standard normal distribution?

Calculations of Probabilities and Outcomes of Normal Distribution

1. Seventeen percent of Regina high school students are involved in a competitive sport. If a sample of 200 students is surveyed, what is the probability that at least 40 are involved in a competitive sport?

2. A town consists of 17% children. A sample of 500 individuals is taken. What is the probability that over 20% of the sample consists of children?

Ans: \(0.0322\)

\[
\mu = np = 500(0.17) = 85
\]

\[
\sigma = \sqrt{npq} = \sqrt{500(0.17)(0.83)} = 8.3994047
\]

\[
P(>20\% \text{ children}) = P(>0.20 \times 500) = P(>100)
\]

\[
X = 100.5, \quad z = 1.8453689
\]

\[
P = 0.5 - 0.4678 = 0.0322
\]

3. A firm sends their employees to a computer programming course. The mean and standard deviation of the final exam scores are 80 and 5. The distribution of scores is normally distributed.

a) A trainee scored 70 on the test. Compute the z-score for this trainee.
b) What proportion of test takers scored better than 70?
c) The firm wants to reward all individuals who have scores in the top 10%. What exam score will separate the top 10% of scores from the bottom 90%?

4. An instructor is administering a final examination. She tells her class that she will give an A to the 10% of the students who earn the highest grades. Past experience with the same examination has shown that the mean grade is 75 and the standard deviation is 8. If the present class runs true to form, what grade would a student need in order to earn an A?

5. Suppose that 15% of the trees in Wascana Park have severe leaf damage from air pollution.

a) If 5 trees are selected at random, find the probability that 3 of them have severe leaf damage.
b) If 1000 trees are selected, use the normal approximation to the binomial distribution to find the probability that at least 120 of them have severe leaf damage.

6. A survey of 300 seniors was taken in Regina. It is known from the last census that 56% of all seniors in Regina were women. What is the probability that at least half of the sample will consist of women?

 

definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2026 by Arzu Sardarli and Andrei Volodin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.