-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make InterceptTerm
s generate Bool
s to avoid unnecessary promotion of other columns
#294
base: master
Are you sure you want to change the base?
Conversation
I'm generally onboard with this, but we should probably do something like a pkgeval to see if we would actually break anything. But we're also not so near the bottom of the dependency tree as StatsBase so a formally breaking release wouldn't be as bad as watching e.g. the StatsBase 0.34 changes propagate. |
test/modelmatrix.jl
Outdated
contr = DummyCoding(; base=4, levels=1:4) | ||
d = (; y=rand(Float32, 4), x=rand(Float32, 4), z=[1:4; ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't yet used below, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is used but it's redundant with the definition above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait the contr part? yeah that's right.
+1 We could tag 1.0, as it was the plan already (removing |
modelcols(t::InterceptTerm{true}, d::NamedTuple) = ones(Bool, size(first(d))) | ||
modelcols(t::InterceptTerm{false}, d) = Matrix{Bool}(undef, size(first(d),1), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to generate BitArray
s here? I guess it doesn't matter much, but it would save some memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd kinda hoped to be able to just use a scalar but hcat doesn't work with that
This is one possible fix for #293. As described in that thread, the bottleneck here is actually that
InterceptTerm
s generateArray{Float64}
s, which then gethcat
with other columns and promote them to Float64. With this change, normal julia promotion rules apply, and values will only be promoted as needed.First adding tests to confirm that this behavior is currently broken, then will push teh fix and open for review when CI is green.
I will add that this is potentially breaking: users may end up seeing eltypes that are not even floats come out of
modelcols
if none of the columns in their input tables are floats. I think a lot of this can be remediated by having downstream dependents make sure they, x
arrays returned bymodelcols(::FormulaTerm)
are actually float arrays with the appropriate eltype...