-
Notifications
You must be signed in to change notification settings - Fork 27
Simple Linear Regression
Start with some bi-variate data; that is, for each measurement you have an x-value and a y-value. For example:
using System.Collection.Generics;
List<double> x = new List<double>() {-1.1, 2.2, 1.4, 0.5, 3.7, 2.8};
List<double> y = new List<double>() {-2.9, 3.4, 0.9, 0.1, 6.8, 5.7};
Note that the measurements are paired. That is, the y = 0.9 values goes with the x = 1.4 value, even though they are in different collections.
If your data is stored in the Meta.Numerics.Data framework, you can pass columns into the APIs illustrated here. In fact, any collection that exposes x and y as IReadOnlyList<double^gt; will do.
This is called simple linear regression. Here is the code to do it:
using System;
using Meta.Numerics.Statistics;
LinearRegressionResult result = y.LinearRegression(x);
Console.WriteLine($"y = ({result.Intercept}) + ({result.Slope}) x");
Notice that we have used the LinearRegression method as an extension method on the y data collection. This has the nice effect of giving our code the same visual layout as the mathematical description of our model y ~ linear regression of x.
The most common measurement is r-squared, which gives the fraction of the variance in the y values that is explained by the fit.
Console.WriteLine($"Fit explains {result.RSquared * 100.0}% of the variance");
Yes. You can test the null hypothesis that there is in, fact, no x-dependence in the y values.
Console.WriteLine($"Probability of no dependence {result.R.Probability}.");
Sometimes this is seen as a Pearson R test, as illustrated here, and sometimes as an F-test as part of an ANOVA analysis. These are actually the same tests, and will give identical P-values.
Yes!
OneWayAnovaResult anova = result.Anova;
Console.WriteLine("Fit dof = {0} SS = {1}", anova.Factor.DegreesOfFreedom, anova.Factor.SumOfSquares);
Console.WriteLine("Residual dof = {0} SS = {1}", anova.Residual.DegreesOfFreedom, anova.Residual.SumOfSquares);
Console.WriteLine("Total dof = {0} SS = {1}", anova.Total.DegreesOfFreedom, anova.Total.SumOfSquares);
Console.WriteLine($"Probability of no dependence {anova.Result.Probability}.");
Why, by the way, do you want an ANOVA table? Statistics programs print them out, and some people appear to feel duty-bound to copy them. But have you ever used anything besides the P-value of the corresponding F-test to draw some conclusion?
Yes! Here is some sample code:
// Print a 95% confidence interval on the slope
Console.WriteLine($"slope is in {result.Slope.ConfidenceInterval(0.95)} with 95% confidence");
Yes! You can get the full fit parameter co-variance matrix, and the best fit parameters as a parameter space vector. Here is some sample code:
using Meta.Numerics.Matrices;
ColumnVector parameters = result.Parameters.ValuesVector;
SymmetricMatrix covariance = result.Parameters.CovarianceMatrix;
Yes!
IReadOnlyList<double> residuals = result.Residuals;
Yes!
using Meta.Numerics;
double x1 = 3.0;
UncertainValue y1 = result.Predict(x1);
Console.WriteLine($"Predicted y({x1}) = {y1}.");
Yes! On bi-variate data, you can also do polynomial regression, non-linear regression to an arbitrary, parameterized function, and linear logistic regression. If you have multi-variate data, you can do multi-variate linear regression and multi-variate linear logistic regression. If you have error bars on your y-data you can use the UncertainSample class to fit to constants, lines, polynomials, linear combinations of arbitrary functions, and arbitrary, parameterized non-liner functions.
- Project
- What's New
- Installation
- Versioning
- Tutorials
- Functions
- Compute a Special Function
- Bessel Functions
- Solvers
- Evaluate An Integral
- Find a Maximum or Minimum
- Solve an Equation
- Integrate a Differential Equation
- Data Wrangling
- Statistics
- Analyze a Sample
- Compare Two Samples
- Simple Linear Regression
- Association
- ANOVA
- Contingency Tables
- Multiple Regression
- Logistic Regression
- Cluster and Component Analysis
- Time Series Analysis
- Fit a Sample to a Distribution
- Distributions
- Special Objects
- Linear Algebra
- Polynomials
- Permutations
- Partitions
- Uncertain Values
- Extended Precision
- Functions