Both the training MSE and RSS decrease monotonically as more features are considered

https://github.com/Gurobi/modeling-examples/blob/1abb8700611e45bb34a760eebe2f6dcd1ff85875/linear_regression/l0_regression.html#L13183


## RSS vs MSE

This paragraph mentioned that it is not advisable to use RSS as the performance metric, but MSE via cross-validation.

**I think the highlight on MSE over RSS is misleading.** Note that, given estimate $\hat\beta$, 

$$
\mathrm{RSS} = (y-X\hat\beta)^T(y-X\hat\beta) = \sum (y_i - \hat{y}_i)^2 = n \cdot \mathrm{MSE}
$$

**So, both the training MSE and RSS decrease monotonically as more features are considered,** not only RSS.

## Cross-validation

The cross-validation part should be the correct. That is, we use grid search to find best $s$. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Both the training MSE and RSS decrease monotonically as more features are considered #14

RSS vs MSE

Cross-validation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Both the training MSE and RSS decrease monotonically as more features are considered #14

Description

RSS vs MSE

Cross-validation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions