-
Notifications
You must be signed in to change notification settings - Fork 75
Swap out patsy
for formulaic
#463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Cool. Thanks @ksolarski, just a quick reply from my phone... Don't do this for the synthetic control because I have an in progress PR that will change it. It won't have a formula input. But can I just get some clarification... does this change the API? Can we get the exact same functionality? If not, let's think again. Will try to look at the code properly when I can 👍🏻 |
I can't find where I saw it in the I'm not 100% sure that this is a problem, and apologies I can't find the relevant part in the docs. But does my concern make sense? |
You're right, Patsy has the power of preserving the transformation / encoding of variables through However, Patsy repo suggests migration to https://github.com/matthewwardrop/formulaic instead, which is capable of "reusing the encoding choices made during conversion of one data-set on other datasets." (see https://matthewwardrop.github.io/formulaic/latest/). There's also a migration guide from Patsy to Formulaic to switch would be easy. It also supports many operators: https://matthewwardrop.github.io/formulaic/latest/guides/grammar/ Did you check out this library before? What do you think about using this instead of formulae? |
@drbenvincent any strong opinions about using |
Sorry for the delayed response @ksolarski. So as far as I understand, Right now there are no use-cases for hierarchical modelling. That might change in the future, though I don't have any specific use cases in mind. So I guess the only choice at the moment is |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #463 +/- ##
==========================================
- Coverage 94.66% 94.66% -0.01%
==========================================
Files 32 32
Lines 2195 2194 -1
==========================================
- Hits 2078 2077 -1
Misses 117 117 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@drbenvincent Yes, from the docs it seems that no hierarchical models are allowed in the import pandas as pd
from formulaic import model_matrix
import formulaic
# Create a training dataset
train_data = pd.DataFrame(
{
"feature1": ["A", "B", "C", "D"],
"target": [0, 1, 0, 1],
}
)
# Create a test dataset
test_data = pd.DataFrame(
{
"feature1": [
"A", # In training
"D", # In training
"E", # Not in training
],
"target": [0, 1, 0],
}
)
# Generate the model matrix for the training data
train_matrix = model_matrix("target ~ 0 + feature1", train_data)
# Print the training matrix and spec
print("Training Matrix:")
print(train_matrix)
# Use the same spec to transform the test data
test_matrix = model_matrix(spec=train_matrix.model_spec, data=test_data)
# Print the test matrix - see that columns are properly aligned from the training data transformation
print("\nTest Matrix:")
print(test_matrix) Is that the problem you had in mind or something else? |
@ksolarski Yes that's pretty much it. Turns out the phrase I was looking for was "stateful transforms" which you pretty much said here. I'm actually wondering - if we don't get the additional functionality of hierarchical modeling, is there much benefit from moving from patsy to formulaic? Not saying it's not worth it, but we should be clear about the rationale. Is it because patsy is no longer maintained and formulaic might see more features, or does it have a richer formula API? Sorry for the extended conversation on this by the way - but given the formula aspect is a core part of the API it's worth thinking it through :) |
Hi @ksolarski and @drbenvincent, sorry for the delay in my response. I can add a few pieces of information.
With that said, unless you need hierarchical models supported via the |
23221c2
to
0734c8b
Compare
@drbenvincent I went with |
Solving issue #386
Starting with DiD, will continue with other methods if you with general design @drbenvincent
Seems like the key practical difference between
formulae
andpatsy
is lack ofbuild_design_matrices
method informulae
. User has to then provide formula again.Edit: After discussion, it was decided that
formulaic
suits us best.📚 Documentation preview 📚: https://causalpy--463.org.readthedocs.build/en/463/