Normalize input for Choice Models #208

Eh2406 · 2018-03-20T20:49:18Z

A number of times we have accidentally compared the magnitude of coefficients in the yaml files that represent MNLDiscreteChoiceModel instances. This is of course a mistake as 0.001 is a large coefficient for nonres_sqft and a small coefficient for frac_developed. In addition the magick 3's problem; The code puts a hard cut off for coefficients at -3 and 3. This is a grate default for normalized variables i.e. ones with std ~=1 mean ~=0 but way to small/big for other columns. If coefficients are made comparable then we can also consider adding L1 or L2 regularization.

My proposal is that when fitting a model subtract the mean and divide by the std for each column. In the yaml file store the training mean, training std, and the coefficients of the transformed columns. Then when predicting with a model we transform with the stored mean and std. Use of the Models will be unchanged, but the stored coefficients will be comparable with each other.

Thoughts?

The text was updated successfully, but these errors were encountered:

smmaurer · 2018-04-01T16:44:44Z

This sounds like a promising feature! I am cautiously enthusiastic.

Some points in favor: I believe that widely divergent coefficients are a problem not just for interpretation, but also for speed and accuracy of the parameter estimation. (The search for optimal values is harder if some are far from the starting point and if the sensitivities vary.)

Advice I've heard is that the best practice is to manually scale the input data so that the fitted coefficients are of similar magnitude. But this is not convenient, especially in a semi-automated context like building an UrbanSim model. Automatically normalizing the input data would help.

Some points of caution: We'd need to be very clear in the documentation and in the output that the fitted parameters apply to transformed data. I don't think this is a common approach. And it should be an optional setting.

Our roadmap is to move the statistics logic out of the UrbanSim repository and into ChoiceModels, but it seems fine to implement this feature here and include it in a point release. ChoiceModels is a ways off from being ready, and the shift will be disruptive enough that we should save it for a major version bump of UrbanSim.

Eh2406 · 2018-04-02T18:05:43Z

Advice I've heard is that the best practice is to manually scale the input data so that the fitted coefficients are of similar magnitude.

I know I have heard andrew gelman express that viewpoint in the past, but a quick google found a more nuanced blog post of his.

Some points of caution: We'd need to be very clear in the documentation and in the output that the fitted parameters apply to transformed data. I don't think this is a common approach. And it should be an optional setting.

Ok, so if we make it an optional setting, a way to make it clearer may be to have the section in the yaml be called 'Normalized Coefficient' instead of 'Coefficient'. If we do some math, we could eavan record both in the yaml, leading persons to look at the docs to determine the difference.

hanase mentioned this issue Mar 21, 2018

Figure out why estimation returns coefficients of -3.0 or 3.0 on some variables psrc/urbansim2#118

Open

Eh2406 mentioned this issue Mar 30, 2018

normalize data for dcm #209

Open

4 tasks

hanase mentioned this issue Apr 17, 2018

Replace the subtract_mean() function in repm*.yaml with something else psrc/urbansim2#138

Closed

smmaurer mentioned this issue Apr 17, 2018

Report number of records in estimation #210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize input for Choice Models #208

Normalize input for Choice Models #208

Eh2406 commented Mar 20, 2018

smmaurer commented Apr 1, 2018 •

edited

Loading

Eh2406 commented Apr 2, 2018

Normalize input for Choice Models #208

Normalize input for Choice Models #208

Comments

Eh2406 commented Mar 20, 2018

smmaurer commented Apr 1, 2018 • edited Loading

Eh2406 commented Apr 2, 2018

smmaurer commented Apr 1, 2018 •

edited

Loading