Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.2.dev1] Better sampling support for MergedChoiceTable utility #37

Merged
merged 28 commits into from
Sep 6, 2018

Conversation

smmaurer
Copy link
Member

@smmaurer smmaurer commented Aug 16, 2018

This PR adds substantial functionality to the MergedChoiceTable utility.

It's related to Issues #4, #5, and #11, and to UDST/urbansim_templates MNL support.

Features and usage

MergedChoiceTable now supports:

  • sampling with or without replacement
  • alternative-specific weights
  • interaction weights that apply to combinations of choosers and alternatives
  • automatic joining of interaction terms onto the merged table
  • non-sampling (all the alternatives available for each chooser)
  • estimation/simulation support for all combinations

All this should work automatically in MNL models. Note that with non-random sampling of alternatives and small sample sizes, estimated coefficients can be biased unless a correction term is added (see issue #38).

The intention of this PR is to provide general-purpose functionality that can serve as a back end for more specialized tools that automate distance-based sampling, bands, buckets, etc.

I've also done groundwork for the following features that will come later:

  • availability of alternatives
  • accepting callables for on-the-fly calculation of weights, availability, and interaction terms
  • representation of random state, for replicability

Implementation

This required deep enough surgery that the easiest approach was to start fresh rather than drawing on existing code in urbansim.urbanchoice (which did not support weights, availability, non-replacement, or non-sampling use cases).

I've done some basic optimization for things like choosing the most efficient underlying sampling library for each use case (mostly NumPy but sometimes core Python) and drawing single rather than repeated samples whenever possible.

Issue #39 discusses the current performance of the code, and optimizations we might want to look into.

Other changes

  • LargeMultinomialLogit class now optionally accepts a MergedChoiceTable as input
  • PR includes unit tests for each table combination, but they could be improved

Versioning

  • increments version number to 0.2.dev1

@smmaurer smmaurer requested a review from Arezoo-bz August 17, 2018 20:47
@coveralls
Copy link

coveralls commented Sep 4, 2018

Coverage Status

Coverage increased (+6.6%) to 59.194% when pulling 0b8a2b9 on sampling-weights into b3cb2b9 on master.

@smmaurer smmaurer merged commit c673198 into master Sep 6, 2018
@smmaurer smmaurer deleted the sampling-weights branch September 6, 2018 23:32
@smmaurer smmaurer changed the title Better sampling support for MergedChoiceTable utility [0.2.dev1] Better sampling support for MergedChoiceTable utility Nov 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants