[BUG] Catetegorify
can't process vocabs
correctly when num_buckets>1
#1857
Labels
bug
Something isn't working
Describe the bug
nvt.ops.Categorify
don't processvocabs
correctly whennum_buckets>1
is given simultaneously.Steps/Code to reproduce bug
I tried to use
categorify
transform with pre-defined vocabs.I also have to consider multiple oov, so I also gives
num_buckets>1
for parameter.For above code, expected index for each values are like below.
But, I get following result with wrong category dictionary.
df_out
pd.read_parquet("./categories/meta.Authors.parquet")
pd.read_parquet("./categories/unique.Authors.parquet")
I check inside of
Categorify.process_vocabs
function andoov_count
can getnum_buckets
correctly.But when
process_vocabs
function callCategorify._save_encodings()
, it doesn't make the vocabulary dictionary correctly.Expected behavior
From
NVTabular/nvtabular/ops/categorify.py
Lines 432 to 438 in 77b94a4
I fix the code where
process_vocab
callCategorify._save_encodings
withoov_count
.and I got following result of
df_out
like as I expected.Environment details (please complete the following information):
pip
Additional context
None
The text was updated successfully, but these errors were encountered: