Meaning of r2

The number of irrelevant answers on a test is graphed versus the age of the child being tested (n = 10).
How "good" is age at predicting/estimating/accounting for the number of irrelevant answers? (Assuming the straight "regression" line is doing the predicting.)
Don't know age?  Measure the variability of the irrelevant answers with the variance:
Find the distance of each child's irrelevant answers from the mean of all the irrelevant answers,
square the distances, sum them, divide by n-1 (= variance)
( take the square root = standard deviation.)pic

If you use age to predict, get the predicted irr. answers for each age-point, and find their variance.
Take the ratio  of the variances. (or of sums of squares).  That's r2.
                                                          See also Rsquared Excel file.

Addendum: How do these ideas relate to the RESIDUALS?
For each point,
Residual = Observed -Predicted
Predicted + Residual = Observed
These relationships hold whether we measure the Predicted and Observed from "0," or from the mean line for Y, as shown in the pictures.
So the greater the Predicted distances from the Y-mean, as a proportion of the Observed distances, the smaller the Residuals, and viceversa.
Our intuitive requirement that the vertical scatter from the line (shown in the Residuals) should be measured in some sense by "r2", is satisfied.


Sievers home    3pm  3/8/04
Math151-Sp04/Rsquaredmeaning.htm
 

This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.