Skip to content

Commit

Permalink
1. Fix #119 for OOV token update
Browse files Browse the repository at this point in the history
2. Fix #118 for hyperpara tuning
  • Loading branch information
xpai committed Oct 23, 2024
1 parent 9758a89 commit b026ce9
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ Please install other required packages via `pip install -r requirements.txt`.

```
cd experiment
python run_param_tuner.py --config config/DCN_tiny_h5_tuner_config.yaml --gpu 0 1 2 3 0 1 2 3
python run_param_tuner.py --config config/DCN_tiny_parquet_tuner_config.yaml --gpu 0 1 2 3 0 1 2 3
```

## 🔥 Citation
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
base_config: ../model_zoo/DCN/DCN_torch/config/
base_expid: DCN_default
dataset_id: tiny_npz
dataset_id: tiny_parquet

tiny_parquet:
data_root: ../data/
data_format: npz
train_data: ../data/tiny_parquet/train.parquet
valid_data: ../data/tiny_parquet/valid.parquet
test_data: ../data/tiny_parquet/test.parquet

tuner_space:
model_root: './checkpoints/'
Expand Down
3 changes: 2 additions & 1 deletion fuxictr/preprocess/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@ def merge_vocab(self, shared_tokenizer):
else:
shared_tokenizer.vocab.update(self.vocab)
vocab_size = shared_tokenizer.vocab_size()
if shared_tokenizer.vocab["__OOV__"] != vocab_size - 1:
if (shared_tokenizer.vocab["__OOV__"] != vocab_size - 1 or
shared_tokenizer.vocab["__OOV__"] != len(shared_tokenizer.vocab) - 1):
shared_tokenizer.vocab["__OOV__"] = vocab_size
self.vocab = shared_tokenizer.vocab
return shared_tokenizer
Expand Down

0 comments on commit b026ce9

Please sign in to comment.