-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
categorical
with levels and recoding at once
#389
Comments
I've been looking for the same functionality. You have two cases in mind. The one with where CategoricalArray{String,1}(
arr,
CategoricalPool(Dict("female" => 2, "male" => 1))
) (I think it's undocumented though) @nalimilan, shouldn't it be possible to construct a function categorical(refarray::AbstractArray{R, N},
invleveldict::Dict{V,R},
ordered=false
) where {N, V, R <: Integer}
CategoricalArray{V,N}(refarray, CategoricalPool(invleveldict, ordered))
end Probably one could also allow |
Yeah this definitely makes sense. I haven't implemented these yet because I concentrated on getting the basics right, without working too much on convenience. But feel free to make a PR. There are a few subtle issues to address though:
|
I looked through the issues but didn't see something comparable, excuse me if I missed something and duplicate old discussions.
Whenever I work with categorical data, it's usually something simple like "male"/"female", but often coded in the original dataset with placeholders such as
1
and2
or'm'
and'f'
. So if I want a categorical array with"male"
"female"
I have to take two steps, create the array and then recode. I feel like it would be more straightforward to allow recoding at creation of the data, that could also be faster if there's a lot of data. I'm thinking about an API with a vector of pairs like this:So you can see that this both allows to set the categorical values that I want, and at the same time allows to set the ordering that differs from the natural 1, 2 sequence.
I think usually one would need to do something like this:
This gets more cumbersome the more levels there are and two full arrays need to be created.
The text was updated successfully, but these errors were encountered: