An introduction to regression and regression residuals

Getting Started

On the left side of the page below there is an open scatterplot with scales that go from 0 to 100.   This is an interactive scatterplot that has two modes indicated by the top left boxes. While in the insert mode, whenever you click on the scatterplot a data point is inserted at the location you had your cursor where you clicked.  The X and Y values of that location are presented in the upper right-hand corner.  You can add up to 50 points of data.  As you add more data marks along the margins of the scatterplot show you the "marginal distributions".  The page starts in the insert mode. You can erase all data by clicking on the "Reset" button.

The second mode is the play mode. Enter the play mode by clicking on the "play" button.  When you are in the play mode you can move your cursor on top of an existing data point in the scatterplot and it will be highlighted.  By keeping the mouse button down, you can move that datapoint to any location in the scatterplot. 

On the right side of the page below is a scatterplot that shows the residuals for the linear regression on the data you placed in window on the left.  The initial pattern shows only the points. If you click on the "residuals" button on the upper right corner, vertical lines will appear in both windows showing the size of the residuals.  Also the correlation and regression formulas are shown.  The plots are linked, so moving datapoint on the left leads to changes in the datapoints on the right and all the summary statistics are updated.  Notice the scale on the residuals is different from the scatterplot. Why is this so?

Try to work with the features of the module below. Then look at the "Things to do" section below.



Things to do

Appropriate activities with this module depend on what you are most interested in learning at this point. Here are some ideas.

If you are just getting used to working with residuals, add data and move data to answer the following questions. Spend some time with the "residuals" button on and some times with the "points only" button on.

  • Organize the data in a straight line around the regression line. Describe the correlation and the size of the residuals.

  • Organize the data like a ball around the regression line. Describe the correlation and the size of the residuals.

  • Describe how the shape of the data on the left is reflected in the residuals on the right, and how this relates to the correlation.

  • Try to make a curve in the residuals.  Try to keep a curve in the residuals while making the regression line more or less steep.

  • Try to make the largest residual you can.

If you have been working with residuals, try the following:

  • Create data that violates the normality assumption while not violating the homogeneity assumption.

  • Create data that violates the homogeneity assumption while not violating the normality assumption.

  • Create data that has no correlation, but can be changed to a very large (positive or negative) correlation by adding a single data point.

  • Create a pattern in the data in which a single outlier has a lot of control over the location of the regression line, but has a very small residual.

Try to summarize your conclusions.