Skip to content
Sections
Personal tools
You are here: Home » Education » ME2016 » Interactive Numerical Methods » Linear Regression and Correlation

Linear Regression and Correlation

NOTE:  If the applet below does not appear, you may need to download the latest version of Java or enable Java applets in your browser (in IE: Tools/Internet Options/Advanced/Java)

Linear Regression Theory

Consider the problem of fitting a straight line through a set of data points: , ,..., .  The straight line can be parameterized as: 

where  and  are the parameters and  is the error or residual.

Note:  this problem is called linear regression because the equation above is linear in the coefficients , and not because the equation happens to represent a straight line.  Fitting the quadratic  would also be a linear regression problem.

To find the straight line that best fits the data set, we use a least-squares formulation:

Minimize  where 

For a straight line curve fit, the sum of squares of the residuals, , reaches a minimum where:

  and

To express how good of a fit this best-fit line is, one can consider the coefficient of determination:

where  is the total sum of squares of the residuals between the data points and the mean:

The correlation is the square root of  and can also be expressed as:

The correlation is an indication of how well the data points fit the straight line.  A perfect fit with a positive slope corresponds to a correlation of 1 — the line explains 100% of the variability in the data.  A correlation of 0 indicates that the line explains 0% of the variability; that is, the explanation is no better than characterizing the data set by its mean.

For additional details about Linear Regression, review "Chapter 17: Least Squares Regression" in Numerical Methods for Engineers by Chapra and Canale. A succinct on-line overview can be found at Wikipedia

You can also develop a better understanding of these concepts by exploring them interactively with the applet below.  

Start Exploring!

In the interactive window below you can perform the following operations:

  • Move a data point: click-and-drag a data point to a different position
  • Add a data point: ctrl-click in the location where you want to add it
  • Remove a data point: ctrl-click on the data point to be removed
  • Toggle regression curves: click on the blue or red button for y=f(x) and x=f(y), respectively
  • Retrieve a pre-programmed data set: click on one of the buttons S0 through S3
  • Print this entire page: click the printer button at the top right of this web-page

Learn by Exploring

    1. Create a data set of at least 10 points with a correlation of 0.9.  How would you characterize the "shape" of your data set?
    2. Repeat the previous task with correlation values of -0.5 and 0.0.
    3. How does the shape change with different values of the correlation?
    4. What is the value of correlation for a data set consisting of only two points?  Explain.
    5. How is the slope of the regression line related to the correlation value?
    6. Now, click on the red button to turn on the display of the linear regression for x = f(y). Is the regression x=f(y) [in red] the same as the regression y=f(x) [in blue]? Explain.
    7. Create two data sets, one for which the difference between y=f(x) and x=f(y) is as small as possible, and one for which the difference is as large as possible. Describe your solutions.
Created by cparedis
Contributors : Ivan Lee and Chris Paredis
(c) Ivan Lee and Chris Paredis 2006
Last modified 08/13/2006 02:48 PM
« October 2008 »
Su Mo Tu We Th Fr Sa
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  
Log in
 
 

Powered by Plone