Skip to content

Simple Linear Regression

David Wright edited this page Apr 16, 2018 · 6 revisions

Start with some bi-variate data; that is, for each measurement you have an x-value and a y-value. For example:

using System.Collection.Generics;

List<double> x = new List<double>() {-1.1, 2.2, 1.4, 0.5, 3.7, 2.8};
List<double> y = new List<double>() {-2.9, 3.4, 0.9, 0.1, 6.8, 5.7}; 

Note that the measurements are paired. That is, the y = 0.9 values goes with the x = 1.4 value, even though they are in different collections.

If your data is stored in the Meta.Numerics.Data framework, you can pass columns into the APIs illustrated here. In fact, any collection that exposes x and y as IReadOnlyList<double^gt; will do.

How do I find the best-fit line?

This is called simple linear regression. Here is the code to do it:

using System;
using Meta.Numerics.Statistics;

LinearRegressionResult result = y.LinearRegression(x);
Console.WriteLine($"y = ({result.Intercept}) + ({result.Slope}) x");

Notice that we have used the LinearRegression method as an extension method on the y data collection. This has the nice effect of giving our code the same visual layout as the mathematical description of our model y ~ linear regression of x.

How good a fit is my line?

The most common measurement is r-squared, which gives the fraction of the variance in the y values that is explained by the fit.

Console.WriteLine($"Fit explains {result.RSquared * 100.0}% of the variance");

Is there some null hypothesis I can test against?

Yes. You can test the null hypothesis that there is in, fact, no x-dependence in the y values.

Console.WriteLine($"Probability of no dependence {result.R.Probability}.");

Sometimes this is seen as a Pearson R test, as illustrated here, and sometimes as an F-test as part of an ANOVA analysis. These are actually the same tests, and will give identical P-values.

Can I get an ANOVA table for the fit?

Yes!

OneWayAnovaResult anova = result.Anova;
Console.WriteLine("Fit        dof = {0} SS = {1}", anova.Factor.DegreesOfFreedom, anova.Factor.SumOfSquares);
Console.WriteLine("Residual   dof = {0} SS = {1}", anova.Residual.DegreesOfFreedom, anova.Residual.SumOfSquares);
Console.WriteLine("Total      dof = {0} SS = {1}", anova.Total.DegreesOfFreedom, anova.Total.SumOfSquares);
Console.WriteLine($"Probability of no dependence {anova.Result.Probability}.");

Why, by the way, do you want an ANOVA table? Statistics programs print them out, and some people appear to feel duty-bound to copy them. But have you ever used anything besides the P-value of the corresponding F-test to draw some conclusion?

Can I get confidence intervals on the slope and intercept?

Yes! Here is some sample code:

// Print a 95% confidence interval on the slope
Console.WriteLine($"slope is in {result.Slope.ConfidenceInterval(0.95)} with 95% confidence");

Can I get the co-variances between the fit parameters?

Yes! You can get the full fit parameter co-variance matrix, and the best fit parameters as a parameter space vector. Here is some sample code:

using Meta.Numerics.Matrices;

ColumnVector parameters = result.Parameters.ValuesVector;
SymmetricMatrix covariance = result.Parameters.CovarianceMatrix;

Can I get residuals?

Yes!

IReadOnlyList<double> residuals = result.Residuals;

Can I get a predicted value for a new x?

Yes!

using Meta.Numerics;

double x1 = 3.0;
UncertainValue y1 = result.Predict(x1);
Console.WriteLine($"Predicted y({x1}) = {y1}.");

Can I do more complicated regressions?

Yes! On bi-variate data, you can also do polynomial regression, non-linear regression to an arbitrary, parameterized function, and linear logistic regression. If you have multi-variate data, you can do multi-variate linear regression and multi-variate linear logistic regression. If you have error bars on your y-data you can use the UncertainSample class to fit to constants, lines, polynomials, linear combinations of arbitrary functions, and arbitrary, parameterized non-liner functions.

Home

Clone this wiki locally