14. Correlation and Regression
14.5 Linear Regression
Linear regression gives us the best equation of a line through the scatter plot data in terms of least squares. Let’s begin with the equation of a line:
![]()
where
is the intercept and
is the slope.

The data, the collection of
points, rarely lie on a perfect straight line in a scatter plot. So we write
![]()
as the equation of the best fit line. The quantity
is the predicted value of
(predicted from the value of
) and
is the measured value of
. Now consider :

The difference between the measured and predicted value at data point
,
, is the deviation. The quantity
![]()
is the squared deviation. The sum of the squared deviations is
![]()
The least squares solution for
and
is the solution that minimizes
, the sum of squares, over all possible selections of
and
. Minimization problems are easily handled with differential calculus by solving the differential equations:
![]()
The solution to those two differential equations is
![]()
and
![]()
Example 14.3 : Continue with the data from Example 14.1 and find the best fit line. The data again are:
| Subject | |||||
| A | 6 | 82 | 492 | 36 | 6724 |
| B | 2 | 86 | 172 | 4 | 7396 |
| C | 15 | 43 | 645 | 225 | 1849 |
| D | 9 | 74 | 666 | 81 | 5476 |
| E | 12 | 58 | 696 | 144 | 3364 |
| F | 5 | 90 | 450 | 25 | 8100 |
| G | 8 | 78 | 624 | 64 | 6084 |
Using the sums of the columns, compute:

and

So
![]()

▢
14.5.1: Relationship between correlation and slope
The relationship is
![]()
where

are the standard deviations of the
and
datasets considered separately.