Chapter 1. Graphing Descriptions of Data
Chapter Objectives
In this chapter, readers will learn to:
- Define and classify variables (quantitative, qualitative, continuous, discrete, ordinal, nominal).
- Classify and describe various types of graphs to present data.
- Define frequency, relative frequency, and cumulative frequency.
- Construct a relative frequency histogram.
- Construct a frequency polygon, relative frequency distribution and ogive.
- Describe and classify types of histogram shapes
1.1. Variables
In general, a variable is a varying quantitative or qualitative characteristic of an object. The word “variable” has a Latin etymology. It comes from the word variābilis, which means “capable of changing.” In mathematics, the variable is a set of values, usually represented as numerals. Variables are used, in particular, in specifying mathematical expressions. Indian mathematician and astronomer Brahmagupta (598–668 CE) used colours to represent variables in mathematical expressions. Later, in the seventeenth century, European mathematicians used letters (x, y, z) to denote variables.
For example, monthly gasoline prices (fig. 1.1a), the daily average temperatures, the weight of fish caught in a lake, and the colour of a rug (fig. 1.1b) can be considered variables.

Fig. 1.1. (a) Gasoline prices across Canada (obtained from
National Joint Council website on January 28, 2024);

Fig. 1.1 (b) A digital depiction of the maple leaf tartan, a national symbol of Canada. Thread count: 38 dark green, 6 dark red, 6 dark green, 30 dark red, 26 light brown, 30 dark red, 6 dark green, 6 dark red, 38 dark green, 12 gold, 12 light brown, 12 dark brown; obtained from Wikimedia Commons on April 19, 2026)
The concept of a variable is commonly used in mathematics, science, economics, and programming. For example, in statistics, we will specify two types of variables: quantitative and qualitative.
Quantitative Variables
Quantitative variables take on numerical values. If the variable X can take on any value in its range, it is a continuous variable. If X can take on only a finite number of possible distinct values, then X is a discrete variable. However, certain discrete variables can take on an infinite number of values. Consider the following example.
Example 1.1
Let X be the height of a tree in a forest. X can be any positive real number in a specific range. Therefore, X is a continuous variable.
Example 1.2
A university runs classes only if at least 15 students are registered. Let N be the number of students registered in the class STAT 100. These numbers can be only natural (counting numbers) greater than 14. Therefore, N is a discrete variable.
Example 1.3
Over the years, the Canadian Challenge has been recognized as a world-class sporting event. It is the nation’s longest sled dog race, which starts, runs, and finishes in Canada. It has attracted teams from across Canada, the United States, Australia, Germany, Serbia, and Belgium. An added highlight is the eight-dog race from Prince Albert to La Ronge, Saskatchewan. Let Y be the number of dogs participating in each race. However, Y can be only multiples of 8: 8, 16, 24. Therefore, Y is a discrete variable.
Now, we can summarize the classification of variables in statistics, as shown in figure 1.2.
Figure 1.2. Types of variables
Qualitative Variables
Qualitative variables take values in mutually exclusive categories, which may or may not have an intrinsic natural order. If there is some natural order, then the data is said to have an ordinal level of measurement. Examples of ordinal data are placings in a contest, letter grades, and rankings of performance (excellent, good, poor).
Suppose the categories do not have a natural order. In that case, the data is said to have a nominal level of measurement. Examples of nominal data are colours, political parties, religion, or marital status.
Example 1.4
Random interviews conducted by telephone (landline and cell) of 1,004 Canadians aged 18 years and over were conducted, ending August 13, 2021. The interviewees told which political parties they planned to support in the federal election to be held on September 20, 2021. The records showed that 33.4% of interviewees supported the Liberals, 28.4% the Conservatives, 20.7% the NDP, 7.9% the Green Party of Canada, 6.3% the Bloc Québécois, and 1.9% the People’s Party of Canada. In this example, the preference of interviewees varies over the six parties. Therefore, the preference of interviewees can be considered a qualitative variable.
Example 1.5
A car salesperson decided to determine the preference of customers for vehicle colours and recorded them in order of each sale in the course of a day:
Red, red, blue, yellow, red, silver, silver, silver, red, blue, blue
In this example, the colour of sold cars varies. Therefore, colour is a qualitative variable.
Example 1.6
Amy recorded the mother languages of her classmates and created the following table:
|
Languages (in alphabetical order) |
Students |
|
Cree |
15 |
|
Dakota |
8 |
|
Dene |
11 |
|
English |
9 |
|
French |
2 |
|
German |
1 |
|
Lakota |
7 |
|
Russian |
1 |
|
Turkish |
1 |
|
Ukrainian |
3 |
|
Urdu |
2 |
We leave it to students to determine the variable in this example.
1.2. Graphs
Graphs and, to a lesser extent, tables give a visual summary of a variable. Ideally, there is an indication of the central (or “average”) value of the variable as well as an indication of the amount and pattern of variability (“spread”). The level of data restricts the type of graphs/tables that can be used. Many software computer packages exist for visualizing the collected data, such as Excel, R, Sigma Plot, and Maple. Below, we will provide some classification of graphs used in statistics for presenting qualitative and quantitative variables.
Graphs for Qualitative Data
In statistics, pie charts and bar charts are considered suitable for visualizing qualitative variables.
Pie Charts
A pie chart is made up of a circle divided into several slices representing different categories. The area of each slice is representative of the proportion of the data that corresponds to that category. Pie charts are effective when the aim is to display the relative size of the categories.
We will present the data from example 1.4 using a pie chart. However, first, we must construct a table.
| Party | Percent | Angle |
| Conservatives | 28.4 | (28.4 × 360)/100 = 102.2° |
| Liberals | 33.4 | 120.2° |
| NDP | 20.7 | 74.5° |
| Bloc Québécois | 6.3 | 22.7° |
| Green Party of Canada | 7.9 | 28.4° |
| People’s Party of Canada | 1.9 | 6.8° |
| Other | 1.4 | 5.0° |
| Total | 100.0 | 360.0° |
Now we are ready to construct the pie chart

Figure 1.3. Pie chart for data provided in example 1.4
Bar Charts
In statistics, we often use bars to represent data. Depending on the data types and purpose of the presentation, various formats of bars can be used for charts.
The data from example 1.4 presented in a pie chart format can also be visualized using the vertical bars (fig. 1.4).

Figure 1.4. Vertical bar chart for the data provided in example 1.4
Example 1.7
Figure 1.5 presents the results of our water quality studies in the Kahkewistahaw First Nation community conducted in 2010 (Sardarli et al. l, 2010). Within the studies, community members using Indigenous Knowledge evaluated water quality in Calling Lakes (Saskatchewan, Canada) for the period 1978–2008 and made their evaluation and projection for specific years. Thirty community members of Kahkewistahaw First Nation were asked to compare the water quality in 1999 and 2009. They were asked to answer the question “How was the quality of water in Calling Lakes ten years ago in comparison with our days (2009)?,” choosing one of the following options: much better, better, about the same, worse, and much worse. Respondents’ answers were recorded in a table format.
|
Answers |
Numbers of respondents |
|
Much better |
1 |
|
Better |
3 |
|
About the same |
8 |
|
Worse |
15 |
|
Much worse |
3 |
The collected data can be presented using the horizontal bars.

Figure 1.5. Horizontal bar chart for the data provided in example 1.7
Graphs for Quantitative Data
Line charts, dot plots, and stem and leaf plots are more convenient for presenting data with quantitative variables.
Line Charts
A line chart is a graph that presents quantitative variables as a series of data points connected by straight line segments. Line charts are used in many fields, such as statistics, mathematics, science, and economics. For example, a line chart is often used to visualize a trend in data over time.
Example 1.8
Figure 1.6 represents the monthly average temperature in Saskatchewan from January to December 2000.

Figure 1.6. Line chart for the monthly average temperature in Saskatchewan from January to December 2000
This data set is an example of so-called time series data. Time series data is an interesting branch of statistics. Students can learn more about time series analysis and forecasting in higher statistics courses.
Dot Plots
In statistics, dot plots are used for representing quantitative data. On this graph, each piece of data is given by a dot located on a specific position with respect to a scaled horizontal line.
Example 1.9
A statistics teacher prepared 12 tests for her class. The number of questions for each test is given below:
5, 5, 2, 4, 4, 4, 7, 10, 3, 3, 4, 4
Figure 1.7 represents this data in the form of a dot plot.

Figure 1.7. Dot plot for the data provided in example 1.9
Example 1.10
The Academy Awards, also known as the Oscars, recognize the achievements and merit of artists in the film industry worldwide. The awards were first presented in 1929. One of the awards was created to honour the best directors. Here are the ages of 96 winners in this category in the order of receipt (obtained from the website www.oscars.org):
The dot plot shown in figure 1.8 represents the ages of 96 Academy Award winners in the Best Director category from 1929 to 2021.

Figure 1.8. Dot plot for the data provided in example 1.10
For instance, we can obtain the following information from the above dot plot:
- From 1929 to 2021, among directors, 44 was the most “winning” age for the award (seven recipients).
- Only one winner during these years won the award when they were older than 70.
Stem and Leaf Plots
Stem and leaf plots are a valuable way of ordering data so we can study their characteristics. It simultaneously organizes the data for further analyses and presents it in table and chart form.
Example 1.11
Jana checked the durations of her phone calls during the week and recorded them in minutes.
|
19.2 |
19.8 |
18 |
19.2 |
19.5 |
17.3 |
|---|---|---|---|---|---|
|
20 |
20.3 |
19.6 |
18.5 |
18.1 |
19.7 |
|
18.4 |
17.6 |
21.2 |
19.7 |
22.2 |
19.1 |
|
21.1 |
19.3 |
20.8 |
21.2 |
21 |
18.7 |
|
19.8 |
18.7 |
22.1 |
17.2 |
18.4 |
21.4 |
We can define the whole part of numbers as a stem.
Stem
17
18
19
20
21
22
In the next column, we will place the tenths.
Stem
17 3, 6, 2
18 0, 5, 1, 4, 7, 7, 4
192, 8, 2, 5, 6, 7, 7, 1, 3, 8
200, 3, 8
212, 1, 2, 0, 4
222, 1
Now we can form the leaf by putting the tenths in ascending order.
StemLeaf
17 236
18 0144577
191123567789
20038
2101224
2212
From this stem and leaf plot, we can conclude that Jana mainly talked for 19–20 minutes by phone during this particular week. She never talked longer than 23 minutes, and never less than 17 minutes.
1.3. Relative Frequency Histogram
Frequency, Relative Frequency, Cumulative Frequency
In statistics, the frequency is the number of events or data that occurred or were recorded during the experiment or study. In example 1.10, the age 44 is recorded seven times. Therefore, we state that the frequency of the age 44 is 7.
In statistics, we more often use relative frequency. A relative frequency is determined as a ratio of the frequency over the total number of measurements:
[latex]\displaystyle Relative frequency=\frac{frequency}{N}[/latex]
Later in this book, we will also use cumulative frequency to analyze data sets. The cumulative frequency is the number of observations above or below a specific value in an ascended-ordered data set. The cumulative frequency is calculated using a frequency distribution table.
Example 1.12
An administrative assistant recorded the number of students registered in mathematics classes in her department:
|
67 |
34 |
34 |
70 |
46 |
57 |
70 |
34 |
41 |
22 |
41 |
22 |
34 |
|
34 |
73 |
78 |
34 |
46 |
34 |
84 |
36 |
74 |
73 |
36 |
26 |
|
Construct the frequency table for the provided data.
Solution:
First, we put the numbers in ascending order:
22, 22, 26, 34, 34, 34, 34, 34, 34, 34, 36, 41, 46, 46, 46, 46, 57, 67, 70, 70, 70, 73, 74, 78, 84
The data set contains 25 observations: n = 25
|
Number of registered students |
Frequency |
Cumulative frequency |
Relative frequency |
|
22 |
2 |
2 |
2/25=0.08 |
|
26 |
1 |
2+1=3 |
1/25=0.04 |
|
34 |
7 |
2+1+7=10 |
7/25=0.28 |
|
36 |
2 |
2+1+7+2=12 |
2/25=0.08 |
|
41 |
2 |
2+1+7+2+2=14 |
2/25=0.08 |
|
46 |
2 |
2+1+7+2+2+2=16 |
2/25=0.08 |
|
57 |
1 |
2+1+7+2+2+2+1=17 |
1/25=0.04 |
|
67 |
1 |
2+1+7+2+2+2+1+1=18 |
1/25=0.04 |
|
70 |
2 |
2+1+7+2+2+2+1+1+2=20 |
2/25=0.08 |
|
73 |
2 |
2+1+7+2+2+2+1+1+2+2=22 |
2/25=0.08 |
|
74 |
1 |
2+1+7+2+2+2+1+1+2+2+1=23 |
1/25=0.04 |
|
78 |
1 |
2+1+7+2+2+2+1+1+2+2+1+1=24 |
1/25=0.04 |
|
84 |
1 |
2+1+7+2+2+2+1+1+2+2+1+1+1=25 |
1/25=0.04 |
|
Total |
25 |
|
1 |
Based on the frequency table above, we can make some conclusions that are true for any data set:
- The sum of frequencies equals the population size.
- The sum of relative frequencies equals 1
- The last cumulative frequency equals the population size.
The frequencies and relative frequencies can be presented by graphs, called histograms and relative frequency histograms. A histogram represents a bar chart, where the heights of the bars are determined by the number of data in each class. To construct a histogram, we first divide the data into classes. By convention, the number of classes can be chosen as equal-length intervals from 5 to 12. Figure 1.9 shows how the dot plot can be transformed into a histogram.

(a)

(b)
Figure 1.9. (a) Dot plot and (b) histogram constructed from the data provided in example 1.9.
As one can see, in this example, the data are divided into five classes. The number of measurements in each class is counted using the so-called left-inclusion method. This method considers including the left-boundary data and excluding the right-boundary data for each class. Hereafter, we will use square brackets for inclusive and parentheses for exclusive boundaries. The class intervals in figure 1.9 can be shown as [1, 3), [3, 5), [5, 7), [7, 9), and [9, 11).
To determine the length of intervals, first, we find the range, R, of the data as the difference between the largest and lowest observations. Then, we divide the range into the specified number of equally spaced intervals.
Constructing a Relative Frequency Histogram
Example 1.13
From 2010 to 2012, as part of a research project supported by the First Nations University of Canada, we surveyed members of Kahkewistahaw First Nation. The questionnaire contained a question about the number of residents in each household. The research assistant recorded the numbers of residents in 41 households:
| 10 | 3 | 5 | 1 | 6 | 5 | 6 | 7 | 5 | 2 | 5 | 8 |
| 12 | 5 | 8 | 4 | 3 | 5 | 8 | 3 | 1 | 8 | 1 | 8 |
| 7 | 7 | 2 | 5 | 3 | 6 | 6 | 3 | 4 | 3 | 3 | 4 |
| 2 | 4 | 2 | 4 | 6 |
Below, we provide step-by-step instructions for constructing a relative frequency histogram using the data.
Step 1. Construction of Classes
The range of the collected data is R = 12 – 1 = 11. Let us divide the data into six classes. So, the length of each class equals 11/6 = 1.83. For convenience, we approximate the class width up to 2. Therefore, we get the following classes: [1, 3), [3, 5), [5, 7), [7, 9), [9, 11), and [11, 13).
Step 2. Relative Frequency Table
Now, we must determine each class’s relative frequencies using the left-inclusion method.
|
Class number |
Class intervals |
Frequencies of classes |
Relative frequencies of classes |
|
1 |
[1, 3) |
7 |
7/41 = 0.171 |
|
2 |
[3, 5) |
12 |
12/41 = 0.293 |
|
3 |
[5, 7) |
12 |
12/41 = 0.293 |
|
4 |
[7, 9) |
8 |
8/41 = 0.195 |
|
5 |
[9, 11) |
1 |
1/41 = 0.024 |
|
6 |
[11, 13) |
1 |
1/41 = 0.024 |
|
Sum |
|
41 |
1.000 |
The sum of relative frequencies in fraction notation is as follows: 7/41 + 12/41 + 12/41 + 8/41 + 1/41 + 1/41 = 1
Step 3. Graphing the Relative Frequency Histogram
The following relative frequency histogram is plotted using Excel. Later in this book, we will provide detailed instructions for plotting the relative frequency histogram using this software.

Figure 1.10. Relative frequency histogram constructed from data provided in example 1.13
As mentioned above, sometimes, we need to refer to cumulative frequencies. In example 1.12, we determined cumulative frequencies for observations. Similarly, we can evaluate cumulative frequencies for classes if it is more convenient to divide the data set into classes. Below, we present the extended frequency table for the data set given in example 1.13.
|
Class number |
Class intervals |
Frequencies of classes |
Relative frequencies of classes |
Cumulative frequencies of classes |
|
1 |
[1, 3) |
7 |
7/41 = 0.171 |
7 |
|
2 |
[3, 5) |
12 |
12/41 = 0.293 |
7+12=19 |
|
3 |
[5, 7) |
12 |
12/41 = 0.293 |
7+12+12=31 |
|
4 |
[7, 9) |
8 |
8/41 = 0.195 |
7+12+12+8=39 |
|
5 |
[9, 11) |
1 |
1/41 = 0.024 |
7+12+12+8+1=40 |
|
6 |
[11, 13) |
1 |
1/41 = 0.024 |
7+12+12+8+1+1=41 |
|
Sum |
|
41 |
1.000 |
|
Example 1.14
It is recommended to determine more classes for more data values available. For instance, we will arrange 11 classes for the data in example 1.10.
|
35 |
33 |
44 |
35 |
32 |
38 |
48 |
37 |
42 |
39 |
39 |
41 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
51 |
47 |
48 |
40 |
57 |
46 |
39 |
44 |
38 |
42 |
41 |
42 |
|
47 |
59 |
46 |
45 |
36 |
52 |
50 |
56 |
57 |
54 |
43 |
47 |
|
55 |
35 |
65 |
51 |
59 |
36 |
62 |
44 |
50 |
36 |
45 |
52 |
|
36 |
44 |
41 |
42 |
40 |
47 |
44 |
44 |
59 |
43 |
53 |
51 |
|
40 |
48 |
46 |
43 |
36 |
48 |
62 |
47 |
43 |
40 |
43 |
43 |
|
52 |
34 |
38 |
48 |
69 |
42 |
74 |
51 |
64 |
50 |
53 |
52 |
|
58 |
38 |
44 |
58 |
52 |
51 |
52 |
32 |
53 |
57 |
50 |
Step 1. Construction of Classes
The range of the collected data is R = 74 – 32 = 42. Since we determined 11 classes, the length of each class equals 42/11 = 3.818. For convenience, we approximate the class width up to 4. Therefore, we get the following classes: [30 – 34), [34 – 38), [38 – 42), [42 – 46), [46 – 50), [50 – 54), [54 – 58), [58 – 62), [62 – 66), [66 – 70), and [70 – 75).
Step 2. Relative Frequency Table
|
Class number |
Class Intervals |
Frequencies of classes |
Relative frequencies of classes |
|
1 |
[30, 34) |
3 |
3/96 = 0.03 |
|
2 |
[34, 38) |
10 |
10/96 = 0.10 |
|
3 |
[38, 42) |
15 |
15/96 = 0.16 |
|
4 |
[42, 46) |
20 |
20/96 = 0.21 |
|
5 |
[46, 50) |
13 |
13/96 = 0.14 |
|
6 |
[50, 54) |
18 |
18/96 = 0.19 |
|
7 |
[54, 58) |
6 |
6/96 = 0.06 |
|
8 |
[58, 62) |
5 |
5/96 = 0.05 |
|
9 |
[62, 66) |
4 |
4/96 = 0.04 |
|
10 |
[66, 70) |
1 |
1/96 = 0.01 |
|
11 |
[70, 74) |
1 |
1/96 = 0.01 |
|
Sum |
|
96 |
1 |
Step 3. Graphing the Relative Frequency Histogram

Figure 1.11. Relative frequency histogram constructed from data provided in example 1.14
Observe that, unlike the relative frequency histogram of the data presented in example 1.14, the histogram of this data set has two peaks (fig. 1.11). These types of data distributions are described as bimodal. In statistics, we use specific terms to describe the data distribution represented by histograms.
1.12 below includes examples with more typical shapes of histograms.

(a) We describe this shape as skewed right because of the long tail on the right and the short one on the left.

(b) Similarly, this shape is called skewed left.

(c) This type of histogram is called bell-shaped. Some textbooks use the term “mound-shaped.” It is also unimodal and symmetric.

(d) This histogram shows that some data points on the right are out of scope. We call these striking deviations outliers. Later, we will discuss the procedure of detecting outliers in more detail.

(e) In example 1.13, we observed the histogram with two peaks and described as bimodal.

(f) Some histograms may contain more than two peaks. We call them multimodal.
Figure 1.12. Histograms with various shapes: (a) skewed right; (b) skewed left; (c) mound-shaped, unimodal, and symmetric; (d) histogram with outlier; (e) bimodal; and (f) multimodal
What Does a Relative Frequency Histogram Tell Us?
As one can see from the examples above, the shape’s relative frequency histogram provides much helpful information about the collected data and its distribution. Later, after improving your statistical analysis background, you will be able to obtain more from histograms. Now, we can answer some simple questions using the relative frequency histogram in example 1.14.
1) How would you describe the shape of the data distribution?
Answer: bimodal, skewed to the right
2) What percentage of award winners are 50 years or older?
Answer: 19% + 6% + 5% + 4% + 2% = 36%
3) What is the chance that a randomly selected award winner is age 42 or older and younger than 46?
Answer: 20 out of 96 or 0.21 or 21%
4) What are the chances that a randomly selected award winner is younger than 42? (This interval excludes 42.)
Answer: 0.03 + 0.10 + 0.16 = 0.29 or 29%
5) What are the chances that a randomly selected award winner is not younger than 38 and younger than 50? (This interval includes 38 and excludes 50.)
Answer: 0.16 + 0.21 + 0.14 = 0.51 or 51%
Later in this book, we will define concepts such as proportion and probability. You will see that these quantities and the relative frequency are similar. Similar diagrams will be used to analyze the distributions of these quantities.
Class Limits
Sometimes, we need to refer to the boundaries of classes to analyze the given data distribution. In other words, we need to deal with class limits. In a frequency distribution, class limits are considered the smallest and largest observation values for each class. Each class has a lower class limit and an upper class limit. The lower class limit is the smallest observation, usually included in the class (inclusive) by convention. The upper class limit is the largest data, which determines the upper border of the class, but does not belong to the class (exclusive). The left-inclusive class’s upper limit equals the next class’s left class limit. The table below shows the lower and upper class limits based on the relative frequency table created for example 1.14.
|
Class number |
Class intervals |
Lower class limit |
Upper class limit |
|
1 |
[30, 34) |
30 |
34 |
|
2 |
[34, 38) |
34 |
38 |
|
3 |
[38, 42) |
38 |
42 |
|
4 |
[42, 46) |
42 |
46 |
|
5 |
[46, 50) |
46 |
50 |
|
6 |
[50, 54) |
50 |
54 |
|
7 |
[54, 58) |
54 |
58 |
|
8 |
[58, 62) |
58 |
62 |
|
9 |
[62, 66) |
62 |
66 |
|
10 |
[66, 70) |
66 |
70 |
|
11 |
[70, 74) |
70 |
74 |
In this example, we constructed equal-length classes to ease the interpretation. However, one should note that equal length for classes is not necessary, although highly recommended.
Frequency Polygons
Very often, statisticians need to work with more than one data set, at which time comparing the frequency distributions of sets becomes necessary. In this case, using frequency polygons instead of the relative frequency histograms is recommended. The frequency polygon is a curve that is drawn on the x-y coordinate system such that the x-axis represents the values in the data set, while the y-axis shows the number of frequencies of each distinct observation. Suppose it is more convenient to divide the data set in the classes, as we did when constructing the histograms; in that case, the class frequencies are plotted above the midpoint of each class interval and connected by straight lines.
Below, we present the frequency polygons for the data sets used in examples 1.12 and 1.14 (fig. 1.13)

(a)

(b)
Figure 1.13. Frequency polygons constructed from data provided in (a) example (1.12) and (b) example 1.14
Example 1.15
An administrative assistant recorded the number of students registered in mathematics and statistics classes in the Department of Mathematics and Statistics in fall 2021.
|
Mathematics |
67 |
34 |
34 |
70 |
46 |
57 |
70 |
34 |
41 |
22 |
|
41 |
22 |
34 |
34 |
73 |
78 |
34 |
46 |
34 |
84 |
|
|
36 |
74 |
73 |
36 |
26 |
|
|
|
|
|
|
|
Statistics |
28 |
36 |
72 |
26 |
32 |
56 |
62 |
70 |
56 |
62 |
|
52 |
72 |
68 |
68 |
70 |
48 |
64 |
70 |
42 |
32 |
|
|
68 |
64 |
68 |
70 |
34 |
28 |
72 |
70 |
64 |
70 |
Construct frequency polygons for both data sets in the same coordinate system.
Solution:
First, we will create the frequency tables for subject classes.
|
Students registered in mathematics classes |
Frequency |
Students registered in statistics classes |
Frequency |
|
22 |
2 |
26 |
1 |
|
26 |
1 |
28 |
2 |
|
34 |
7 |
32 |
2 |
|
36 |
2 |
34 |
1 |
|
41 |
2 |
36 |
1 |
|
46 |
2 |
42 |
1 |
|
57 |
1 |
48 |
1 |
|
67 |
1 |
52 |
1 |
|
70 |
2 |
56 |
2 |
|
73 |
2 |
62 |
2 |
|
74 |
1 |
64 |
3 |
|
78 |
1 |
68 |
4 |
|
84 |
1 |
70 |
6 |
|
|
|
72 |
3 |
|
Total |
25 |
Total |
30 |
Now, we can plot both frequency polygons in the same coordinate system (fig. 1.14).

Figure 1.14. Frequency polygons for data provided in example 1.15
Sometimes, it is more convenient to refer to relative frequency distributions. For example, the graph below represents the relative frequency distribution for the data set given in example 1.14 (fig. 1.15).

Figure 1.15. Relative frequency distribution for data provided in example 1.15
Later in this textbook, we will need to estimate the proportions of specific observations and their chances of happening. For these purposes, we will need to operate by cumulative values of relative frequencies. The cumulative relative frequency of an observation/class is similar to the cumulative frequency. The only difference is that this time, we evaluate the sum of relative frequencies of observations/classes below this observation/class.
Mainly we will use the tables of cumulative values. First, however, it is helpful to analyze the graphs of cumulative frequencies to observe and explain the trends of distributions. So, first, we construct the frequency table to plot the cumulative frequency distribution. Let us use the data set provided in example 1.13 and graph the cumulative frequency distributions.
|
10 |
3 |
5 |
1 |
6 |
5 |
6 |
7 |
5 |
2 |
5 |
8 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
12 |
5 |
8 |
4 |
3 |
5 |
8 |
3 |
1 |
8 |
1 |
8 |
|
7 |
7 |
2 |
5 |
3 |
6 |
6 |
3 |
4 |
3 |
3 |
4 |
|
2 |
4 |
2 |
4 |
6 |
We can use the same classes defined in the example and construct the frequency table with cumulative frequencies. By convention, we plot the proportion of observations less than the upper limit of the class.
|
Class number |
Upper class limits |
Frequencies |
Cumulative frequencies |
Relative frequencies |
Cumulative relative frequencies |
|
1 |
< 3 |
7 |
7 |
0.171 |
0.171 |
|
2 |
< 5 |
12 |
19 |
0.293 |
0.464 |
|
3 |
< 7 |
12 |
31 |
0.293 |
0.757 |
|
4 |
< 9 |
8 |
39 |
0.195 |
0.952 |
|
5 |
< 11 |
1 |
40 |
0.024 |
0.976 |
|
6 |
< 13 |
1 |
41 |
0.024 |
1.000 |
|
Sum |
|
41 |
|
1.000 |
|
Now, we can plot the cumulative relative frequency distributions. This graph is called ogive (fig. 1.16).

Figure 1.16. Ogive constructed from data provided in example 1.15
Note that the lower end of the ogive has a cumulative relative frequency of 0 at the lower limit of the first class and a value of 1 at the upper limit of the highest class.
1.4. Computer Software with Graphing Tools
The revolutionary growth of computing technologies over the last several decades has encouraged a vigorous development of many useful graphing programs and applications. This section will briefly review some of the computing software with graphing components.
Maple
Maple is a symbolic and numeric computing environment and a multi-paradigm programming language. It covers several areas of technical computing, such as symbolic mathematics, numerical analysis, data processing, visualization, and others. In addition, a tool box, MapleSim, adds multi-domain physical modelling and code generation functionality.
Maple’s capacity for symbolic computing includes those of a general-purpose computer algebra system. For instance, it can manipulate mathematical expressions and find symbolic solutions to specific problems, such as those arising from ordinary and partial differential equations.
Maple provides many useful tools for constructing pie, bar, and line charts, histograms, and other graphs for statistical analysis.
Maple is developed commercially by the Canadian software company Maplesoft. The name “Maple” refers to the software’s Canadian heritage.
Excel
Microsoft developed Excel, which can be used on major platforms such as Windows, macOS, Android, and iOS. This software offers a wide range of statistical functions that calculate a single value or an array of values in Excel worksheets. In addition, Excel has an add-in called the Excel Analysis ToolPak, with many useful statistical analysis tools. This textbook includes laboratory exercises that will help you access statistical analysis advantages through Excel. The laboratory exercises will be provided at the end of the chapters.
SPSS
SPSS (Statistical Package for the Social Sciences) is a widely used program for statistical analysis in social science. It is also used by market researchers, health researchers, survey companies, governments, education researchers, marketing organizations, data miners, and others. The original SPSS manual has been described as one of “sociology’s most influential books” for allowing ordinary researchers to do their own statistical analysis. In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the data file) are features of the base software.
Some leading international social science journals accept only those manuscripts whose statistical analysis is performed using the SPSS program.
R
R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation’s Project for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in R’s popularity. Since August 2021, R has ranked 14th on the TIOBE Index, which measures the popularity of programming languages.
The official R software environment is a GNU package. It is written primarily in C, Fortran, and R (and thus is partially self-hosting) and is freely available under the GNU General Public License. In addition, pre-compiled executables are provided for various operating systems. It has a command line interface, but multiple third-party graphical user interfaces are available, such as RStudio, an integrated development environment, and Jupiter, a notebook interface.
SAS
SAS (previously “Statistical Analysis System”) is a statistical software suite developed by the SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS can mine, alter, manage, and retrieve data from various sources and perform statistical analysis on the data. In addition, SAS provides a graphical point-and-click user interface for non-technical users and more through the SAS language. One of the advantages of SAS is the possibility of handling big data, which is why this program is primarily used by such institutions as banks, credit unions, and public health organizations.
MATLAB
MATLAB (an abbreviation of “matrix laboratory”) is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting functions, and data, implementing algorithms, creating user interfaces, and interfacing with programs written in other languages. Although MATLAB is intended primarily for numeric computing, an optional tool box uses the MuPAD symbolic engine to access symbolic computing abilities. An additional package, Simulink, adds graphical multi-domain simulation and model-based design for dynamic and embedded systems. Unlike R and Excel, MATLAB allows one to solve algebraic equations using mathematical symbols.
Minitab
Minitab is a statistics package developed at Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan Jr., and Brian L. Joiner in 1972. It began as a light version of OMNITAB 80, a statistical analysis program by NIST. Statistical analysis software such as Minitab automates calculations and the creation of graphs, allowing the user to focus more on the analysis of data and the interpretation of results. It is compatible with other Minitab, LLC software.
As one can see, there are many useful programs for constructing graphs to visualize statistical data. Many of them are free of charge and friendly for use by non-professionals. However, we strongly recommend you start constructing your first graphs by hand, following step-by-step instructions that we use to solve this book’s examples. Constructing graphs by hand would help you better understand statistical analysis principles. You can start using these and many other programs and software after achieving a solid understanding of the underlying concepts of statistics and statistical analysis.
Chapter 1 Summary
-
Data and variables
-
Quantitative and qualitative variables
-
Discrete and continuous variables
-
-
-
Graphs
-
Graphs for qualitative data
-
Pie charts
-
Bar charts
-
-
Graphs for quantitative data
-
Line charts
-
Dot plots
-
Stem and leaf plots
-
Relative frequency histograms
-
Frequency polygon
-
Relative frequency distribution
-
Ogive
-
-
Computer software with graphing tools
-
You may also view this chapter in presentation format. Just click the link to view.
EXERCISES
1.1 Data and Variables
Qualitative Variables (Homelessness example)
Quantitative Variables (Discrete, Continuous, Indigenous, PA survey)
1.A student took an informal survey among a sample of 12 of his classmates with the question “How many books did you read over the summer?” Their responses were
0, 1, 1, 2, 1, 3, 1, 3, 2, 3, 1, 5
Classify the variable “number of books read” by circling all appropriate terms in the following list:
qualitative, quantitative, discrete, continuous
2. (10 marks) The following data shows the number of phones (including cell phones) per household in a sample of 13 households.
1, 3, 2, 3, 5, 7, 3, 8, 3, 2, 5, 7, 6
Note that
\[
\sum_{i=1}^{13} x_i = 55
\]
\[
\sum_{i=1}^{13} x_i^2 = 293
\]
2.1 What is the variable of this data set?
----------------------------------------------------------------------------
2.2 What is the experimental unit?
----------------------------------------------------------------------------
2.3 Classify the variable by circling any appropriate terms from the following list:
Continuous, Discrete, Qualitative, Quantitative
Classify each variable as quantitative or qualitative. For quantitative variables, classify whether they are discrete or continuous.
(a) Colors of automobiles in the factory parking lot
(b) Number of desks in a classroom
(c) Classification of children in a day care center (infant, toddler, preschool)
(d) Weights of fish caught in Wascana Lake
(e) Number of pages in a statistics textbook
(f) Capacity (in liters) of water in selected dams
(g) Number of off-road vehicles sold in Canada
(h) Number of loaves of bread baked each day at a local bakery
(i) Water temperature in the swimming pool at Little Manitou Lake
(j) Lifetimes of batteries in an iPhone 4S
4. The following frequency table lists the speed (in km/h) of a sample of 59 cars in a school zone. [10 marks]
| Speed (in km/h) | Frequency |
| 30 ≤ x < 35 | 14 |
| 35 ≤ x < 40 | 23 |
| 40 ≤ x < 45 | 12 |
| 45 ≤ x < 50 | 8 |
| 50 ≤ x < 55 | 2 |
a) Classify the variable “speed” (qualitative/quantitative, discrete/continuous):
Quantitative, continuous
b) What term best describes the shape of the above distribution?
Positively (or right) skewed
c) Is this data from an observational or experimental study?
Observational
- (Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) Determine what the key terms refer to in the following study. A study was conducted at a local college to analyze the average cumulative GPA’s of students who graduated last year. Fill in the letter of the phrase that best describes each of the items below.1. Population_____ 2. Statistic _____ 3. Parameter _____ 4. Sample _____ 5. Variable _____ 6. Data _____
a) all students who attended the college last year
b) the cumulative GPA of one student who graduated from the college last year
c) 3.65, 2.80, 1.50, 3.90
d) a group of students who graduated from the college last year, randomly selected
e) the average cumulative GPA of students who graduated from the college last year
f) all students who graduated from the college last year
g) the average cumulative GPA of students in the study who graduated from the college last year
1.2. Graphs of Qualitative Data
Pie Charts
Bar Charts
- A group of Stat 100 students filled out a questionnaire on their living arrangements while attending University. The following data is part of that study:
Living Arrangements Number of Responses University Residence 52 Rented Apartment/House 61 Owned Condo/House 13 Living with Parents/Family 29 Other 3 Construct a Pie Chart for this data.
- (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) There are 800 students in the College of Arts and Sciences. There are four majors in college: English, History, Biology, and Chemistry. The following shows the number of students in each major;
| Major | Number of Students |
| English | 240 |
| History | 160 |
| Biology | 320 |
| Chemistry | 80 |
a) Develop a percent frequency distribution.
b) Construct a bar chart.
c) Construct a pie chart.
3. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) A student has completed 20 courses in the School of Arts and Sciences. Her grades in the 20 courses are shown below:
A B A B C
C C B B B
B A B B B
C B C B A
a) Develop a frequency distribution for her grades.
b) Develop a percent frequency distribution for her grades.
c) Develop a bar graph.
d) Construct a pie chart.
(Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) for 4.
4. The students in Ms. Ramirez’s math class have birthdays in each of the four seasons. Table 2.40 shows the four seasons, the number of students who have birthdays in each season, and the percentage (%) of students in each group. Construct a bar graph showing the number of students.
| Seasons | Number of students | Proportion of population |
| Spring | 8 | 24% |
| Summer | 9 | 26% |
| Autumn | 11 | 32% |
| Winter | 6 | 18% |
Table 2.40
(Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) for 5.
- David County has six high schools. Each school sent students to participate in a county-wide science competition. Table 2.41 shows the percentage breakdown of competitors from each school, and the percentage of the entire student population of the county that goes to each school. Construct a bar graph that shows the population percentage of competitors from each school.
| High School | Science competition population | Overall student population |
| Alabaster | 28.9% | 8.6% |
| Concordia | 7.6% | 23.2% |
| Genoa | 12.1% | 15.0% |
| Mocksville | 18.5% | 14.3% |
| Tynneson | 24.2% | 10.1% |
| West End | 8.7% | 28.8% |
Table 2.41
3.1. Graphs for Quantitative Data
Line Charts
Dotplots
Stem and Leaf Plots
1. Consider the following 20 scores on Stat.100 midterm exam (out of 20).
8.8, 10.5, 7.8, 6.1, 9.1, 17.2, 9.6, 7.2, 6.6, 7.7, 9.3, 6.8, 7.6, 14.5, 16.9, 8.3, 9.9, 8.7, 9.7, 7.8
Create a stem and leaf plot for the above data.
2. (6 Marks) The following is the graph that represents the yearly number of deaths in 15 years from tornadoes in the USA.
Stem Leaf
4 0 0 2 3 5 5 6
5 1 1 1 1
6 0 1
7 9
8 2
Leaf unit = 1, What is the name of this graph?
3. The Stem and Leaf Display below shows the results (%) on a test.
Stem Leaf
3 1 2
4 3 6 8
5 0 5 5 9
6 2 6 6 8 9
7 0 1 4 4 3 8
8 3 4 5 5 8
9 0 2 3
If the pass mark was 50%, approximately what percent of students failed the test?
4. Twenty-two Stat 100 students have just received their final grades. Grades (out of 100) are listed below.
64, 79, 89, 77, 93, 88, 52, 48, 82, 82, 61, 42, 40, 67, 89, 95, 72, 56, 61, 96, 88, 40
a) Is this data univariate, bivariate, or multivariate?
b) Classify the variable type.
c) Construct a stem and leaf plot for the data. How would you describe the shape of the data?
5. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) The test scores of 14 individuals on their first physics examination are shown below:
95 87 52 43 77 84 78
75 63 92 81 83 91 88
a) Construct a stem-and-leaf display for these data.
b) What does the above stem-and-leaf show?
6. (Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) Statistics are used to compare and sometimes identify authors. The following lists shows a simple random sample that compares the letter counts for three authors.
Terry: 7; 9; 3; 3; 3; 4; 1; 3; 2; 2
Davis: 3; 3; 3; 4; 1; 4; 3; 2; 3; 1
Maris: 2; 3; 4; 4; 4; 6; 6; 6; 8; 3
Make a dot plot for the three authors and compare the shapes.
7. (Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) Use the data to construct the line graph for the following problems:
In a survey, 40 people were asked how many times they visited a store before making a major purchase. The results are shown in Table 2.37.
| Number of times in the store | Frequency |
| 1 | 4 |
| 2 | 10 |
| 3 | 16 |
| 4 | 6 |
| 5 | 4 |
Table 2.37
In a survey, several people were asked how many years it has been since they purchased a mattress. The results are shown in Table 2.38.
| Years since last purchase | Frequency |
| 0 | 2 |
| 1 | 8 |
| 2 | 13 |
| 3 | 22 |
| 4 | 16 |
| 5 | 9 |
Table 2.38
4.1 Relative Frequency Histogram
1. A sample of salaries (in $thousands) of 30 employees at a large company yields the following data:
28, 29, 32, 33, 35, 38, 42, 42, 43, 45
46, 49, 49, 50, 52, 55, 60, 60, 60, 62
62, 63, 64, 65, 71, 75, 80, 89, 90, 97
For the data,
∑x = 1666, ∑x² = 102230
(a) Construct a frequency distribution for the data. Use five classes, with a lower limit of 28 for the first class and an upper limit of 97 for the last class.
(b) Draw a histogram for the frequency distribution.
- Consider the following distribution:
| Cost of Textbooks | Number |
| $25 up to $35 | 2 |
| $35 up to $45 | 5 |
| $45 up to $55 | 7 |
| $55 up to $65 | 20 |
| $65 up to $75 | 16 |
What is the relative class frequency (%) for the $25 up to $35 class?
- Consider the monthly long-distance charges for a sample of 97 residents of Regina.
| Monthly Phone Bill (in $) | Frequency |
| $0 ≤ x < $30 | 2 |
| $30 ≤ x < $60 | 17 |
| $60 ≤ x < $90 | 22 |
| $90 ≤ x < $120 | 32 |
| $120 ≤ x < $150 | 14 |
| $150.00 or more | 10 |
a) Classify the data (qualitative/quantitative/discrete/continuous)
b) Using class midpoints and a representative value of $180.00 for the final class, estimate the mean monthly phone bill for this sample.
- Thirty automobiles were tested for fuel efficiency (in miles per gallon). The following frequency distribution was obtained. Construct a histogram for the cumulative relative frequency of the data (also called an ogive).
| Class Boundaries (mpg) | Frequency |
| 7.5 < x ≤ 12.5 | 3 |
| 12.5 < x ≤ 17.5 | 5 |
| 17.5 < x ≤ 22.5 | 15 |
| 22.5 < x ≤ 27.5 | 5 |
| 27.5 < x ≤ 32.5 | 2 |
5. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) Tiger food company bakes quiches and sells their products in the greater Los Angeles area. Their records over the past 60 days are shown below:
| Sales Volume
(Number of Quiches) |
Number of Days |
| 100-199 | 6 |
| 200-299 | 10 |
| 300-399 | 20 |
| 400-499 | 12 |
| 500-599 | 8 |
| 600-699 | 4 |
| Total | 60 |
a) Develop a cumulative frequency distribution and a percent frequency distribution.
b) What percentage of the days did the company sell at least 400 quiches?
In statistics, a variable is a characteristic or attribute that can take on different values or levels. Variables can be classified into two main types: quantitative and qualitative. Variables are important in statistics because they are used to describe and analyze data. By understanding the type and characteristics of variables, statisticians can choose appropriate statistical methods and techniques to analyze the data and draw meaningful conclusions.
The term "data" refers to information, facts, or figures that are collected, analyzed, and used for reference or analysis. In the context of statistics, data can be quantitative or qualitative and is typically used to draw conclusions, make decisions, or understand patterns and trends.
In statistics, "frequency" refers to the number of times a particular value or category occurs in a data set. It is a count of how often an event or observation happens. Frequency is commonly used to describe the distribution of values within a data set and is a fundamental concept in statistical analysis
Relative frequency, in the context of statistics, refers to the proportion or percentage of the total number of observations that fall into a particular category or have a specific value. It is a measure of the frequency of a particular event or observation relative to the total number of events or observations in a data set.
Cumulative frequency is a measure used in statistics to describe the total number of observations or events that fall below a particular value or category in a data set. It is the sum of the frequencies of all values or categories up to a certain point in the data set.
A histogram is a graphical representation of the distribution of numerical data. It consists of a series of contiguous, non-overlapping bars, where the area of each bar represents the frequency (or relative frequency) of the data within the range it represents. Histograms are useful for visualizing the shape, centre, and spread of the data distribution.
A frequency polygon is a graph that displays the frequency distribution of a data set. It is constructed by connecting the midpoints of the tops of the bars in a histogram with straight lines. The resulting graph shows the shape of the distribution and the trend of the data. Frequency polygons are useful for comparing the distribution of two or more data sets and for identifying similarities and differences in the data. They can also be used to estimate probabilities based on the data.
A relative frequency distribution is a table or graph that shows the proportion or percentage of data values that fall within each class interval. It is constructed by dividing the frequency of each class by the total number of observations and expressing the result as a proportion or percentage.
An ogive, also known as a cumulative frequency polygon, is a graph that represents the cumulative frequency distribution of a set of data. It is constructed by plotting points whose x-coordinates are the upper class limits and whose y-coordinates are the cumulative frequencies. Straight lines then connect the points.
Quantitative variables are numerical and can be measured on a continuous or discrete scale. Examples of quantitative variables include height, weight, age, and income.
Qualitative variables are non-numerical and can be classified into categories or groups. Qualitative variables can be further classified into nominal and ordinal variables. Nominal variables have no inherent order or ranking, and examples include gender, race, and eye color. Ordinal variables have a natural order or ranking, and examples include education level (e.g., high school, college, graduate school) and socio-economic status (e.g., low, middle, high).
A relative frequency histogram is a type of histogram that displays the proportion or percentage of data values that fall within each class interval. It is constructed by dividing the frequency of each class by the total number of observations and plotting the resulting relative frequencies as the heights of the bars in the histogram. Relative frequency histograms are useful for comparing the distribution of data sets with different sample sizes or for comparing the distribution of data sets with different units of measurement. They can also be used to estimate probabilities based on the data.
In the context of statistics and data analysis, a "class" typically refers to a category or interval into which data is grouped for the purpose of constructing frequency distributions, histograms, and other graphical representations of data.
