Chapter 1. Graphing Descriptions of Data

Chapter Objectives

In this chapter, readers will learn to:

1.1. Variables

In general, a variable is a varying quantitative or qualitative characteristic of an object. The word “variable” has a Latin etymology. It comes from the word variābilis, which means “capable of changing.” In mathematics, the variable is a set of values, usually represented as numerals. Variables are used, in particular, in specifying mathematical expressions. Indian mathematician and astronomer Brahmagupta (598–668 CE) used colours to represent variables in mathematical expressions. Later, in the seventeenth century, European mathematicians used letters (x, y, z) to denote variables.

For example, monthly gasoline prices (fig. 1.1a), the daily average temperatures, the weight of fish caught in a lake, and the colour of a rug (fig. 1.1b) can be considered variables.

image

Fig. 1.1. (a) Gasoline prices across Canada (obtained from
National Joint Council website on January 28, 2024);

Fig. 1.1 (b) A digital depiction of the maple leaf tartan, a national symbol of Canada. Thread count: 38 dark green, 6 dark red, 6 dark green, 30 dark red, 26 light brown, 30 dark red, 6 dark green, 6 dark red, 38 dark green, 12 gold, 12 light brown, 12 dark brown; obtained from Wikimedia Commons on April 19, 2026)

The concept of a variable is commonly used in mathematics, science, economics, and programming. For example, in statistics, we will specify two types of variables: quantitative and qualitative.

Quantitative Variables

Quantitative variables take on numerical values. If the variable X can take on any value in its range, it is a continuous variable. If X can take on only a finite number of possible distinct values, then X is a discrete variable. However, certain discrete variables can take on an infinite number of values. Consider the following example.

Example 1.1

Let X be the height of a tree in a forest. X can be any positive real number in a specific range. Therefore, X is a continuous variable.

Example 1.2

A university runs classes only if at least 15 students are registered. Let N be the number of students registered in the class STAT 100. These numbers can be only natural (counting numbers) greater than 14. Therefore, N is a discrete variable.

Example 1.3

Over the years, the Canadian Challenge has been recognized as a world-class sporting event. It is the nation’s longest sled dog race, which starts, runs, and finishes in Canada. It has attracted teams from across Canada, the United States, Australia, Germany, Serbia, and Belgium. An added highlight is the eight-dog race from Prince Albert to La Ronge, Saskatchewan. Let Y be the number of dogs participating in each race. However, Y can be only multiples of 8: 8, 16, 24. Therefore, Y is a discrete variable.

Now, we can summarize the classification of variables in statistics, as shown in figure 1.2.

  • image

Figure 1.2. Types of variables

Qualitative Variables

Qualitative variables take values in mutually exclusive categories, which may or may not have an intrinsic natural order. If there is some natural order, then the data is said to have an ordinal level of measurement. Examples of ordinal data are placings in a contest, letter grades, and rankings of performance (excellent, good, poor).

Suppose the categories do not have a natural order. In that case, the data is said to have a nominal level of measurement. Examples of nominal data are colours, political parties, religion, or marital status.

Example 1.4

Random interviews conducted by telephone (landline and cell) of 1,004 Canadians aged 18 years and over were conducted, ending August 13, 2021. The interviewees told which political parties they planned to support in the federal election to be held on September 20, 2021. The records showed that 33.4% of interviewees supported the Liberals, 28.4% the Conservatives, 20.7% the NDP, 7.9% the Green Party of Canada, 6.3% the Bloc Québécois, and 1.9% the People’s Party of Canada. In this example, the preference of interviewees varies over the six parties. Therefore, the preference of interviewees can be considered a qualitative variable.

Example 1.5

A car salesperson decided to determine the preference of customers for vehicle colours and recorded them in order of each sale in the course of a day:

Red, red, blue, yellow, red, silver, silver, silver, red, blue, blue

In this example, the colour of sold cars varies. Therefore, colour is a qualitative variable.

Example 1.6

Amy recorded the mother languages of her classmates and created the following table:

Languages (in alphabetical order)

Students

Cree

15

Dakota

8

Dene

11

English

9

French

2

German

1

Lakota

7

Russian

1

Turkish

1

Ukrainian

3

Urdu

2

We leave it to students to determine the variable in this example.

1.2. Graphs

Graphs and, to a lesser extent, tables give a visual summary of a variable. Ideally, there is an indication of the central (or “average”) value of the variable as well as an indication of the amount and pattern of variability (“spread”). The level of data restricts the type of graphs/tables that can be used. Many software computer packages exist for visualizing the collected data, such as Excel, R, Sigma Plot, and Maple. Below, we will provide some classification of graphs used in statistics for presenting qualitative and quantitative variables.

Graphs for Qualitative Data

In statistics, pie charts and bar charts are considered suitable for visualizing qualitative variables.

Pie Charts

A pie chart is made up of a circle divided into several slices representing different categories. The area of each slice is representative of the proportion of the data that corresponds to that category. Pie charts are effective when the aim is to display the relative size of the categories.

We will present the data from example 1.4 using a pie chart. However, first, we must construct a table.

Party Percent Angle
Conservatives 28.4 (28.4 × 360)/100 = 102.2°
Liberals 33.4 120.2°
NDP 20.7 74.5°
Bloc Québécois 6.3 22.7°
Green Party of Canada 7.9 28.4°
People’s Party of Canada 1.9 6.8°
Other 1.4 5.0°
Total 100.0 360.0°

Now we are ready to construct the pie chart

Figure 1.3. Pie chart for data provided in example 1.4

Figure 1.3. Pie chart for data provided in example 1.4

Bar Charts

In statistics, we often use bars to represent data. Depending on the data types and purpose of the presentation, various formats of bars can be used for charts.

The data from example 1.4 presented in a pie chart format can also be visualized using the vertical bars (fig. 1.4).

Figure 1.4. Vertical bar chart for the data provided in example 1.4

Figure 1.4. Vertical bar chart for the data provided in example 1.4

Example 1.7

Figure 1.5 presents the results of our water quality studies in the Kahkewistahaw First Nation community conducted in 2010 (Sardarli et al. l, 2010). Within the studies, community members using Indigenous Knowledge evaluated water quality in Calling Lakes (Saskatchewan, Canada) for the period 1978–2008 and made their evaluation and projection for specific years. Thirty community members of Kahkewistahaw First Nation were asked to compare the water quality in 1999 and 2009. They were asked to answer the question “How was the quality of water in Calling Lakes ten years ago in comparison with our days (2009)?,” choosing one of the following options: much better, better, about the same, worse, and much worse. Respondents’ answers were recorded in a table format.

Answers

Numbers of respondents

Much better

1

Better

3

About the same

8

Worse

15

Much worse

3

The collected data can be presented using the horizontal bars.

Figure 1.5. Horizontal bar chart for the data provided in example 1.7

Figure 1.5. Horizontal bar chart for the data provided in example 1.7

Graphs for Quantitative Data

Line charts, dot plots, and stem and leaf plots are more convenient for presenting data with quantitative variables.

 

Line Charts

A line chart is a graph that presents quantitative variables as a series of data points connected by straight line segments. Line charts are used in many fields, such as statistics, mathematics, science, and economics. For example, a line chart is often used to visualize a trend in data over time.

Example 1.8

Figure 1.6 represents the monthly average temperature in Saskatchewan from January to December 2000.

Figure 1.6. Line chart for the monthly average temperature in Saskatchewan from January to December 2000

Figure 1.6. Line chart for the monthly average temperature in Saskatchewan from January to December 2000

 

This data set is an example of so-called time series data. Time series data is an interesting branch of statistics. Students can learn more about time series analysis and forecasting in higher statistics courses.

 

Dot Plots

In statistics, dot plots are used for representing quantitative data. On this graph, each piece of data is given by a dot located on a specific position with respect to a scaled horizontal line.

Example 1.9

A statistics teacher prepared 12 tests for her class. The number of questions for each test is given below:

5, 5, 2, 4, 4, 4, 7, 10, 3, 3, 4, 4

Figure 1.7 represents this data in the form of a dot plot.

Figure 1.7. Dot plot for the data provided in example 1.9

Figure 1.7. Dot plot for the data provided in example 1.9

Example 1.10

The Academy Awards, also known as the Oscars, recognize the achievements and merit of artists in the film industry worldwide. The awards were first presented in 1929. One of the awards was created to honour the best directors. Here are the ages of 96 winners in this category in the order of receipt (obtained from the website www.oscars.org):

The dot plot shown in figure 1.8 represents the ages of 96 Academy Award winners in the Best Director category from 1929 to 2021.

Figure 1.8. Dot plot for the data provided in example 1.10

Figure 1.8. Dot plot for the data provided in example 1.10

For instance, we can obtain the following information from the above dot plot:

  • From 1929 to 2021, among directors, 44 was the most “winning” age for the award (seven recipients).
  • Only one winner during these years won the award when they were older than 70.

 

Stem and Leaf Plots

Stem and leaf plots are a valuable way of ordering data so we can study their characteristics. It simultaneously organizes the data for further analyses and presents it in table and chart form.

Example 1.11

Jana checked the durations of her phone calls during the week and recorded them in minutes.

19.2

19.8

18

19.2

19.5

17.3

20

20.3

19.6

18.5

18.1

19.7

18.4

17.6

21.2

19.7

22.2

19.1

21.1

19.3

20.8

21.2

21

18.7

19.8

18.7

22.1

17.2

18.4

21.4

We can define the whole part of numbers as a stem.

Stem

17

18

19

20

21

22

In the next column, we will place the tenths.

Stem

17 3, 6, 2

18 0, 5, 1, 4, 7, 7, 4

192, 8, 2, 5, 6, 7, 7, 1, 3, 8

200, 3, 8

212, 1, 2, 0, 4

222, 1

 

Now we can form the leaf by putting the tenths in ascending order.

StemLeaf

17 236

18 0144577

191123567789

20038

2101224

2212

From this stem and leaf plot, we can conclude that Jana mainly talked for 19–20 minutes by phone during this particular week. She never talked longer than 23 minutes, and never less than 17 minutes.

 

1.3. Relative Frequency Histogram

Frequency, Relative Frequency, Cumulative Frequency

In statistics, the frequency is the number of events or data that occurred or were recorded during the experiment or study. In example 1.10, the age 44 is recorded seven times. Therefore, we state that the frequency of the age 44 is 7.

In statistics, we more often use relative frequency. A relative frequency is determined as a ratio of the frequency over the total number of measurements:

[latex]\displaystyle Relative frequency=\frac{frequency}{N}[/latex]

Later in this book, we will also use cumulative frequency to analyze data sets. The cumulative frequency is the number of observations above or below a specific value in an ascended-ordered data set. The cumulative frequency is calculated using a frequency distribution table.

Example 1.12

An administrative assistant recorded the number of students registered in mathematics classes in her department:

67

34

34

70

46

57

70

34

41

22

41

22

34

34

73

78

34

46

34

84

36

74

73

36

26

Construct the frequency table for the provided data.

Solution:

First, we put the numbers in ascending order:

22, 22, 26, 34, 34, 34, 34, 34, 34, 34, 36, 41, 46, 46, 46, 46, 57, 67, 70, 70, 70, 73, 74, 78, 84

The data set contains 25 observations: n = 25

Number of registered students

Frequency

Cumulative frequency

Relative frequency

22

2

2

2/25=0.08

26

1

2+1=3

1/25=0.04

34

7

2+1+7=10

7/25=0.28

36

2

2+1+7+2=12

2/25=0.08

41

2

2+1+7+2+2=14

2/25=0.08

46

2

2+1+7+2+2+2=16

2/25=0.08

57

1

2+1+7+2+2+2+1=17

1/25=0.04

67

1

2+1+7+2+2+2+1+1=18

1/25=0.04

70

2

2+1+7+2+2+2+1+1+2=20

2/25=0.08

73

2

2+1+7+2+2+2+1+1+2+2=22

2/25=0.08

74

1

2+1+7+2+2+2+1+1+2+2+1=23

1/25=0.04

78

1

2+1+7+2+2+2+1+1+2+2+1+1=24

1/25=0.04

84

1

2+1+7+2+2+2+1+1+2+2+1+1+1=25

1/25=0.04

Total

25

1

Based on the frequency table above, we can make some conclusions that are true for any data set:

  • The sum of frequencies equals the population size.
  • The sum of relative frequencies equals 1
  • The last cumulative frequency equals the population size.

The frequencies and relative frequencies can be presented by graphs, called histograms and relative frequency histograms. A histogram represents a bar chart, where the heights of the bars are determined by the number of data in each class. To construct a histogram, we first divide the data into classes. By convention, the number of classes can be chosen as equal-length intervals from 5 to 12. Figure 1.9 shows how the dot plot can be transformed into a histogram.

image

(a)

image

(b)

Figure 1.9. (a) Dot plot and (b) histogram constructed from the data provided in example 1.9.

As one can see, in this example, the data are divided into five classes. The number of measurements in each class is counted using the so-called left-inclusion method. This method considers including the left-boundary data and excluding the right-boundary data for each class. Hereafter, we will use square brackets for inclusive and parentheses for exclusive boundaries. The class intervals in figure 1.9 can be shown as [1, 3), [3, 5), [5, 7), [7, 9), and [9, 11).

To determine the length of intervals, first, we find the range, R, of the data as the difference between the largest and lowest observations. Then, we divide the range into the specified number of equally spaced intervals.

Constructing a Relative Frequency Histogram

Example 1.13

From 2010 to 2012, as part of a research project supported by the First Nations University of Canada, we surveyed members of Kahkewistahaw First Nation. The questionnaire contained a question about the number of residents in each household. The research assistant recorded the numbers of residents in 41 households:

10 3 5 1 6 5 6 7 5 2 5 8
12 5 8 4 3 5 8 3 1 8 1 8
7 7 2 5 3 6 6 3 4 3 3 4
2 4 2 4 6

Below, we provide step-by-step instructions for constructing a relative frequency histogram using the data.

Step 1. Construction of Classes

The range of the collected data is R = 12 – 1 = 11. Let us divide the data into six classes. So, the length of each class equals 11/6 = 1.83. For convenience, we approximate the class width up to 2. Therefore, we get the following classes: [1, 3), [3, 5), [5, 7), [7, 9), [9, 11), and [11, 13).

Step 2. Relative Frequency Table

Now, we must determine each class’s relative frequencies using the left-inclusion method.

Class number

Class intervals

Frequencies of classes

Relative frequencies of classes

1

[1, 3)

7

7/41 = 0.171

2

[3, 5)

12

12/41 = 0.293

3

[5, 7)

12

12/41 = 0.293

4

[7, 9)

8

8/41 = 0.195

5

[9, 11)

1

1/41 = 0.024

6

[11, 13)

1

1/41 = 0.024

Sum

41

1.000

The sum of relative frequencies in fraction notation is as follows: 7/41 + 12/41 + 12/41 + 8/41 + 1/41 + 1/41 = 1

Step 3. Graphing the Relative Frequency Histogram

The following relative frequency histogram is plotted using Excel. Later in this book, we will provide detailed instructions for plotting the relative frequency histogram using this software.

Figure 1.10. Relative frequency histogram constructed from data provided in example 1.13

Figure 1.10. Relative frequency histogram constructed from data provided in example 1.13

As mentioned above, sometimes, we need to refer to cumulative frequencies. In example 1.12, we determined cumulative frequencies for observations. Similarly, we can evaluate cumulative frequencies for classes if it is more convenient to divide the data set into classes. Below, we present the extended frequency table for the data set given in example 1.13.

Class number

Class intervals

Frequencies of classes

Relative frequencies of classes

Cumulative frequencies of classes

1

[1, 3)

7

7/41 = 0.171

7

2

[3, 5)

12

12/41 = 0.293

7+12=19

3

[5, 7)

12

12/41 = 0.293

7+12+12=31

4

[7, 9)

8

8/41 = 0.195

7+12+12+8=39

5

[9, 11)

1

1/41 = 0.024

7+12+12+8+1=40

6

[11, 13)

1

1/41 = 0.024

7+12+12+8+1+1=41

Sum

41

1.000

Example 1.14

It is recommended to determine more classes for more data values available. For instance, we will arrange 11 classes for the data in example 1.10.

35

33

44

35

32

38

48

37

42

39

39

41

51

47

48

40

57

46

39

44

38

42

41

42

47

59

46

45

36

52

50

56

57

54

43

47

55

35

65

51

59

36

62

44

50

36

45

52

36

44

41

42

40

47

44

44

59

43

53

51

40

48

46

43

36

48

62

47

43

40

43

43

52

34

38

48

69

42

74

51

64

50

53

52

58

38

44

58

52

51

52

32

53

57

50

Step 1. Construction of Classes

The range of the collected data is R = 74 – 32 = 42. Since we determined 11 classes, the length of each class equals 42/11 = 3.818. For convenience, we approximate the class width up to 4. Therefore, we get the following classes: [30 – 34), [34 – 38), [38 – 42), [42 – 46), [46 – 50), [50 – 54), [54 – 58), [58 – 62), [62 – 66), [66 – 70), and [70 – 75).

Step 2. Relative Frequency Table

Class number

Class Intervals

Frequencies of classes

Relative frequencies of classes

1

[30, 34)

3

3/96 = 0.03

2

[34, 38)

10

10/96 = 0.10

3

[38, 42)

15

15/96 = 0.16

4

[42, 46)

20

20/96 = 0.21

5

[46, 50)

13

13/96 = 0.14

6

[50, 54)

18

18/96 = 0.19

7

[54, 58)

6

6/96 = 0.06

8

[58, 62)

5

5/96 = 0.05

9

[62, 66)

4

4/96 = 0.04

10

[66, 70)

1

1/96 = 0.01

11

[70, 74)

1

1/96 = 0.01

Sum

96

1

Step 3. Graphing the Relative Frequency Histogram

Figure 1.11. Relative frequency histogram constructed from data provided in example 1.14

Figure 1.11. Relative frequency histogram constructed from data provided in example 1.14

Observe that, unlike the relative frequency histogram of the data presented in example 1.14, the histogram of this data set has two peaks (fig. 1.11). These types of data distributions are described as bimodal. In statistics, we use specific terms to describe the data distribution represented by histograms.

1.12 below includes examples with more typical shapes of histograms.

(a) We describe this shape as skewed right because of the long tail on the right and the short one on the left.

(b) Similarly, this shape is called skewed left.

(c) This type of histogram is called bell-shaped. Some textbooks use the term “mound-shaped.” It is also unimodal and symmetric.

(d) This histogram shows that some data points on the right are out of scope. We call these striking deviations outliers. Later, we will discuss the procedure of detecting outliers in more detail.

(e) In example 1.13, we observed the histogram with two peaks and described as bimodal.

(f) Some histograms may contain more than two peaks. We call them multimodal.

Figure 1.12. Histograms with various shapes: (a) skewed right; (b) skewed left; (c) mound-shaped, unimodal, and symmetric; (d) histogram with outlier; (e) bimodal; and (f) multimodal

What Does a Relative Frequency Histogram Tell Us?

As one can see from the examples above, the shape’s relative frequency histogram provides much helpful information about the collected data and its distribution. Later, after improving your statistical analysis background, you will be able to obtain more from histograms. Now, we can answer some simple questions using the relative frequency histogram in example 1.14.

1) How would you describe the shape of the data distribution?
Answer: bimodal, skewed to the right

2) What percentage of award winners are 50 years or older?
Answer: 19% + 6% + 5% + 4% + 2% = 36%

3) What is the chance that a randomly selected award winner is age 42 or older and younger than 46?
Answer: 20 out of 96 or 0.21 or 21%

4) What are the chances that a randomly selected award winner is younger than 42? (This interval excludes 42.)
Answer: 0.03 + 0.10 + 0.16 = 0.29 or 29%

5) What are the chances that a randomly selected award winner is not younger than 38 and younger than 50? (This interval includes 38 and excludes 50.)
Answer: 0.16 + 0.21 + 0.14 = 0.51 or 51%

Later in this book, we will define concepts such as proportion and probability. You will see that these quantities and the relative frequency are similar. Similar diagrams will be used to analyze the distributions of these quantities.

Class Limits

Sometimes, we need to refer to the boundaries of classes to analyze the given data distribution. In other words, we need to deal with class limits. In a frequency distribution, class limits are considered the smallest and largest observation values for each class. Each class has a lower class limit and an upper class limit. The lower class limit is the smallest observation, usually included in the class (inclusive) by convention. The upper class limit is the largest data, which determines the upper border of the class, but does not belong to the class (exclusive). The left-inclusive class’s upper limit equals the next class’s left class limit. The table below shows the lower and upper class limits based on the relative frequency table created for example 1.14.

Class number

Class intervals

Lower class limit

Upper class limit

1

[30, 34)

30

34

2

[34, 38)

34

38

3

[38, 42)

38

42

4

[42, 46)

42

46

5

[46, 50)

46

50

6

[50, 54)

50

54

7

[54, 58)

54

58

8

[58, 62)

58

62

9

[62, 66)

62

66

10

[66, 70)

66

70

11

[70, 74)

70

74

In this example, we constructed equal-length classes to ease the interpretation. However, one should note that equal length for classes is not necessary, although highly recommended.

Frequency Polygons

Very often, statisticians need to work with more than one data set, at which time comparing the frequency distributions of sets becomes necessary. In this case, using frequency polygons instead of the relative frequency histograms is recommended. The frequency polygon is a curve that is drawn on the x-y coordinate system such that the x-axis represents the values in the data set, while the y-axis shows the number of frequencies of each distinct observation. Suppose it is more convenient to divide the data set in the classes, as we did when constructing the histograms; in that case, the class frequencies are plotted above the midpoint of each class interval and connected by straight lines.

Below, we present the frequency polygons for the data sets used in examples 1.12 and 1.14 (fig. 1.13)

(a)

(b)

Figure 1.13. Frequency polygons constructed from data provided in (a) example (1.12) and (b) example 1.14

Example 1.15

An administrative assistant recorded the number of students registered in mathematics and statistics classes in the Department of Mathematics and Statistics in fall 2021.

Mathematics

67

34

34

70

46

57

70

34

41

22

41

22

34

34

73

78

34

46

34

84

36

74

73

36

26

Statistics

28

36

72

26

32

56

62

70

56

62

52

72

68

68

70

48

64

70

42

32

68

64

68

70

34

28

72

70

64

70

Construct frequency polygons for both data sets in the same coordinate system.

Solution:

First, we will create the frequency tables for subject classes.

Students registered in mathematics classes

Frequency

Students registered in statistics classes

Frequency

22

2

26

1

26

1

28

2

34

7

32

2

36

2

34

1

41

2

36

1

46

2

42

1

57

1

48

1

67

1

52

1

70

2

56

2

73

2

62

2

74

1

64

3

78

1

68

4

84

1

70

6

72

3

Total

25

Total

30

Now, we can plot both frequency polygons in the same coordinate system (fig. 1.14).

 

Figure 1.14. Frequency polygons for data provided in example 1.15

 

Sometimes, it is more convenient to refer to relative frequency distributions. For example, the graph below represents the relative frequency distribution for the data set given in example 1.14 (fig. 1.15).

Figure 1.15. Relative frequency distribution for data provided in example 1.15

 

Later in this textbook, we will need to estimate the proportions of specific observations and their chances of happening. For these purposes, we will need to operate by cumulative values of relative frequencies. The cumulative relative frequency of an observation/class is similar to the cumulative frequency. The only difference is that this time, we evaluate the sum of relative frequencies of observations/classes below this observation/class.

Mainly we will use the tables of cumulative values. First, however, it is helpful to analyze the graphs of cumulative frequencies to observe and explain the trends of distributions. So, first, we construct the frequency table to plot the cumulative frequency distribution. Let us use the data set provided in example 1.13 and graph the cumulative frequency distributions.

10

3

5

1

6

5

6

7

5

2

5

8

12

5

8

4

3

5

8

3

1

8

1

8

7

7

2

5

3

6

6

3

4

3

3

4

2

4

2

4

6

We can use the same classes defined in the example and construct the frequency table with cumulative frequencies. By convention, we plot the proportion of observations less than the upper limit of the class.

Class number

Upper class limits

Frequencies

Cumulative frequencies

Relative frequencies

Cumulative relative frequencies

1

< 3

7

7

0.171

0.171

2

< 5

12

19

0.293

0.464

3

< 7

12

31

0.293

0.757

4

< 9

8

39

0.195

0.952

5

< 11

1

40

0.024

0.976

6

< 13

1

41

0.024

1.000

Sum

41

1.000

Now, we can plot the cumulative relative frequency distributions. This graph is called ogive (fig. 1.16).

Figure 1.16. Ogive constructed from data provided in example 1.15

Note that the lower end of the ogive has a cumulative relative frequency of 0 at the lower limit of the first class and a value of 1 at the upper limit of the highest class.

 

1.4. Computer Software with Graphing Tools

The revolutionary growth of computing technologies over the last several decades has encouraged a vigorous development of many useful graphing programs and applications. This section will briefly review some of the computing software with graphing components.

Maple

Maple is a symbolic and numeric computing environment and a multi-paradigm programming language. It covers several areas of technical computing, such as symbolic mathematics, numerical analysis, data processing, visualization, and others. In addition, a tool box, MapleSim, adds multi-domain physical modelling and code generation functionality.

Maple’s capacity for symbolic computing includes those of a general-purpose computer algebra system. For instance, it can manipulate mathematical expressions and find symbolic solutions to specific problems, such as those arising from ordinary and partial differential equations.

Maple provides many useful tools for constructing pie, bar, and line charts, histograms, and other graphs for statistical analysis.

Maple is developed commercially by the Canadian software company Maplesoft. The name “Maple” refers to the software’s Canadian heritage.

Excel

Microsoft developed Excel, which can be used on major platforms such as Windows, macOS, Android, and iOS. This software offers a wide range of statistical functions that calculate a single value or an array of values in Excel worksheets. In addition, Excel has an add-in called the Excel Analysis ToolPak, with many useful statistical analysis tools. This textbook includes laboratory exercises that will help you access statistical analysis advantages through Excel. The laboratory exercises will be provided at the end of the chapters.

SPSS

SPSS (Statistical Package for the Social Sciences) is a widely used program for statistical analysis in social science. It is also used by market researchers, health researchers, survey companies, governments, education researchers, marketing organizations, data miners, and others. The original SPSS manual has been described as one of “sociology’s most influential books” for allowing ordinary researchers to do their own statistical analysis. In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the data file) are features of the base software.

Some leading international social science journals accept only those manuscripts whose statistical analysis is performed using the SPSS program.

R

R is a programming language and free software environment for statistical computing and graphics supported by the R Core Team and the R Foundation’s Project for Statistical Computing. It is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in R’s popularity. Since August 2021, R has ranked 14th on the TIOBE Index, which measures the popularity of programming languages.

The official R software environment is a GNU package. It is written primarily in C, Fortran, and R (and thus is partially self-hosting) and is freely available under the GNU General Public License. In addition, pre-compiled executables are provided for various operating systems. It has a command line interface, but multiple third-party graphical user interfaces are available, such as RStudio, an integrated development environment, and Jupiter, a notebook interface.

SAS

SAS (previously “Statistical Analysis System”) is a statistical software suite developed by the SAS Institute for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. SAS can mine, alter, manage, and retrieve data from various sources and perform statistical analysis on the data. In addition, SAS provides a graphical point-and-click user interface for non-technical users and more through the SAS language. One of the advantages of SAS is the possibility of handling big data, which is why this program is primarily used by such institutions as banks, credit unions, and public health organizations.

MATLAB

MATLAB (an abbreviation of “matrix laboratory”) is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting functions, and data, implementing algorithms, creating user interfaces, and interfacing with programs written in other languages. Although MATLAB is intended primarily for numeric computing, an optional tool box uses the MuPAD symbolic engine to access symbolic computing abilities. An additional package, Simulink, adds graphical multi-domain simulation and model-based design for dynamic and embedded systems. Unlike R and Excel, MATLAB allows one to solve algebraic equations using mathematical symbols.

Minitab

Minitab is a statistics package developed at Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan Jr., and Brian L. Joiner in 1972. It began as a light version of OMNITAB 80, a statistical analysis program by NIST. Statistical analysis software such as Minitab automates calculations and the creation of graphs, allowing the user to focus more on the analysis of data and the interpretation of results. It is compatible with other Minitab, LLC software.

As one can see, there are many useful programs for constructing graphs to visualize statistical data. Many of them are free of charge and friendly for use by non-professionals. However, we strongly recommend you start constructing your first graphs by hand, following step-by-step instructions that we use to solve this book’s examples. Constructing graphs by hand would help you better understand statistical analysis principles. You can start using these and many other programs and software after achieving a solid understanding of the underlying concepts of statistics and statistical analysis.

Chapter 1 Summary

  • Data and variables

    • Quantitative and qualitative variables

      • Discrete and continuous variables

  • Graphs

    • Graphs for qualitative data

      • Pie charts

      • Bar charts

    • Graphs for quantitative data

      • Line charts

      • Dot plots

      • Stem and leaf plots

      • Relative frequency histograms

      • Frequency polygon

      • Relative frequency distribution

      • Ogive

    • Computer software with graphing tools

You may also view this chapter in presentation format. Just click the link to view.

EXERCISES

 

1.1 Data and Variables

Qualitative Variables (Homelessness example)
Quantitative Variables (Discrete, Continuous, Indigenous, PA survey)

1.A student took an informal survey among a sample of 12 of his classmates with the question “How many books did you read over the summer?” Their responses were

0, 1, 1, 2, 1, 3, 1, 3, 2, 3, 1, 5

Classify the variable “number of books read” by circling all appropriate terms in the following list:

qualitative, quantitative, discrete, continuous

2. (10 marks) The following data shows the number of phones (including cell phones) per household in a sample of 13 households.

1, 3, 2, 3, 5, 7, 3, 8, 3, 2, 5, 7, 6

Note that

\[
\sum_{i=1}^{13} x_i = 55
\]

\[
\sum_{i=1}^{13} x_i^2 = 293
\]

2.1 What is the variable of this data set?

----------------------------------------------------------------------------

2.2 What is the experimental unit?

----------------------------------------------------------------------------

2.3 Classify the variable by circling any appropriate terms from the following list:

Continuous, Discrete, Qualitative, Quantitative

Classify each variable as quantitative or qualitative. For quantitative variables, classify whether they are discrete or continuous.

(a) Colors of automobiles in the factory parking lot

(b) Number of desks in a classroom

(c) Classification of children in a day care center (infant, toddler, preschool)

(d) Weights of fish caught in Wascana Lake

(e) Number of pages in a statistics textbook

(f) Capacity (in liters) of water in selected dams

(g) Number of off-road vehicles sold in Canada

(h) Number of loaves of bread baked each day at a local bakery

(i) Water temperature in the swimming pool at Little Manitou Lake

(j) Lifetimes of batteries in an iPhone 4S

4. The following frequency table lists the speed (in km/h) of a sample of 59 cars in a school zone. [10 marks]

Speed (in km/h) Frequency
30 ≤ x < 35 14
35 ≤ x < 40 23
40 ≤ x < 45 12
45 ≤ x < 50 8
50 ≤ x < 55 2

a) Classify the variable “speed” (qualitative/quantitative, discrete/continuous):

Quantitative, continuous

b) What term best describes the shape of the above distribution?

Positively (or right) skewed

c) Is this data from an observational or experimental study?

Observational

  1. (Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) Determine what the key terms refer to in the following study. A study was conducted at a local college to analyze the average cumulative GPA’s of students who graduated last year. Fill in the letter of the phrase that best describes each of the items below.1. Population_____ 2. Statistic _____ 3. Parameter _____ 4. Sample _____ 5. Variable _____ 6. Data _____

a) all students who attended the college last year

b) the cumulative GPA of one student who graduated from the college last year

c) 3.65, 2.80, 1.50, 3.90

d) a group of students who graduated from the college last year, randomly selected

e) the average cumulative GPA of students who graduated from the college last year

f) all students who graduated from the college last year

g) the average cumulative GPA of students in the study who graduated from the college last year

1.2. Graphs of Qualitative Data

Pie Charts

Bar Charts

  1. A group of Stat 100 students filled out a questionnaire on their living arrangements while attending University. The following data is part of that study:
    Living Arrangements Number of Responses
    University Residence 52
    Rented Apartment/House 61
    Owned Condo/House 13
    Living with Parents/Family 29
    Other 3

    Construct a Pie Chart for this data.

  2. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) There are 800 students in the College of Arts and Sciences. There are four majors in college: English, History, Biology, and Chemistry. The following shows the number of students in each major;
Major Number of Students
English 240
History 160
Biology 320
Chemistry 80

a) Develop a percent frequency distribution.
b) Construct a bar chart.
c) Construct a pie chart.

3. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) A student has completed 20 courses in the School of Arts and Sciences. Her grades in the 20 courses are shown below:

A                     B             A             B             C

C                     C             B             B             B

B                     A             B             B             B

C                     B             C             B             A

a) Develop a frequency distribution for her grades.
b) Develop a percent frequency distribution for her grades.
c) Develop a bar graph.
d) Construct a pie chart.

(Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) for 4.

4. The students in Ms. Ramirez’s math class have birthdays in each of the four seasons. Table 2.40 shows the four seasons, the number of students who have birthdays in each season, and the percentage (%) of students in each group. Construct a bar graph showing the number of students.

Seasons Number of students Proportion of population
Spring 8 24%
Summer 9 26%
Autumn 11 32%
Winter 6 18%

Table 2.40

(Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) for 5.

  1. David County has six high schools. Each school sent students to participate in a county-wide science competition. Table 2.41 shows the percentage breakdown of competitors from each school, and the percentage of the entire student population of the county that goes to each school. Construct a bar graph that shows the population percentage of competitors from each school.

 

High School Science competition population Overall student population
Alabaster 28.9% 8.6%
Concordia 7.6% 23.2%
Genoa 12.1% 15.0%
Mocksville 18.5% 14.3%
Tynneson 24.2% 10.1%
West End 8.7% 28.8%

           Table 2.41

3.1. Graphs for Quantitative Data

Line Charts

Dotplots

Stem and Leaf Plots

1. Consider the following 20 scores on Stat.100 midterm exam (out of 20).

8.8, 10.5, 7.8, 6.1, 9.1, 17.2, 9.6, 7.2, 6.6, 7.7, 9.3, 6.8, 7.6, 14.5, 16.9, 8.3, 9.9, 8.7, 9.7, 7.8

Create a stem and leaf plot for the above data.

2. (6 Marks) The following is the graph that represents the yearly number of deaths in 15 years from tornadoes in the USA.

Stem   Leaf

4      0 0 2 3 5 5 6

5      1 1 1 1

6      0 1

7      9

8      2

Leaf unit = 1, What is the name of this graph?

3. The Stem and Leaf Display below shows the results (%) on a test.

Stem   Leaf

3      1 2

4      3 6 8

5      0 5 5 9

6      2 6 6 8 9

7      0 1 4 4 3 8

8      3 4 5 5 8

9      0 2 3

If the pass mark was 50%, approximately what percent of students failed the test?

4. Twenty-two Stat 100 students have just received their final grades. Grades (out of 100) are listed below.
64, 79, 89, 77, 93, 88, 52, 48, 82, 82, 61, 42, 40, 67, 89, 95, 72, 56, 61, 96, 88, 40

a) Is this data univariate, bivariate, or multivariate?
b) Classify the variable type.
c) Construct a stem and leaf plot for the data. How would you describe the shape of the data?

5. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) The test scores of 14 individuals on their first physics examination are shown below:

95      87        52           43           77           84           78

75      63        92           81           83           91           88

a) Construct a stem-and-leaf display for these data.

b) What does the above stem-and-leaf show?

6. (Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) Statistics are used to compare and sometimes identify authors. The following lists shows a simple random sample that compares the letter counts for three authors.

Terry: 7; 9; 3; 3; 3; 4; 1; 3; 2; 2
Davis: 3; 3; 3; 4; 1; 4; 3; 2; 3; 1
Maris: 2; 3; 4; 4; 4; 6; 6; 6; 8; 3
Make a dot plot for the three authors and compare the shapes.

7. (Introductory Statistics, Illowsky, B., Dean, S., Openstax, 2018) Use the data to construct the line graph for the following problems:
In a survey, 40 people were asked how many times they visited a store before making a major purchase. The results are shown in Table 2.37.

Number of times in the store Frequency
1 4
2 10
3 16
4 6
5 4

       Table 2.37

In a survey, several people were asked how many years it has been since they purchased a mattress. The results are shown in Table 2.38.

Years since last purchase Frequency
0 2
1 8
2 13
3 22
4 16
5 9

       Table 2.38

4.1 Relative Frequency Histogram

1. A sample of salaries (in $thousands) of 30 employees at a large company yields the following data:

28, 29, 32, 33, 35, 38, 42, 42, 43, 45

46, 49, 49, 50, 52, 55, 60, 60, 60, 62

62, 63, 64, 65, 71, 75, 80, 89, 90, 97

For the data,

∑x = 1666, ∑x² = 102230

(a) Construct a frequency distribution for the data. Use five classes, with a lower limit of 28 for the first class and an upper limit of 97 for the last class.

(b) Draw a histogram for the frequency distribution.

  1. Consider the following distribution:

 

Cost of Textbooks Number
$25 up to $35 2
$35 up to $45 5
$45 up to $55 7
$55 up to $65 20
$65 up to $75 16

What is the relative class frequency (%) for the $25 up to $35 class?

  1. Consider the monthly long-distance charges for a sample of 97 residents of Regina.
Monthly Phone Bill (in $) Frequency
$0 ≤ x < $30 2
$30 ≤ x < $60 17
$60 ≤ x < $90 22
$90 ≤ x < $120 32
$120 ≤ x < $150 14
$150.00 or more 10

a) Classify the data (qualitative/quantitative/discrete/continuous)

b) Using class midpoints and a representative value of $180.00 for the final class, estimate the mean monthly phone bill for this sample.

  1. Thirty automobiles were tested for fuel efficiency (in miles per gallon). The following frequency distribution was obtained. Construct a histogram for the cumulative relative frequency of the data (also called an ogive).
Class Boundaries (mpg) Frequency
7.5 < x ≤ 12.5 3
12.5 < x ≤ 17.5 5
17.5 < x ≤ 22.5 15
22.5 < x ≤ 27.5 5
27.5 < x ≤ 32.5 2

5. (Introduction to statistics, 2nd Ed, Test Bank, Anderson, D. R., Sweeney, D.J., Williams, T.A, 1991) Tiger food company bakes quiches and sells their products in the greater Los Angeles area. Their records over the past 60 days are shown below:

Sales Volume

(Number of Quiches)

Number of Days
100-199 6
200-299 10
300-399 20
400-499 12
500-599 8
600-699 4
Total 60

 

a) Develop a cumulative frequency distribution and a percent frequency distribution.

b) What percentage of the days did the company sell at least 400 quiches?

 

definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introductory Statistics Copyright © 2026 by Arzu Sardarli and Andrei Volodin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.