Skip to content

Commit

Permalink
Fix default value of --tokenizer
Browse files Browse the repository at this point in the history
In the `prepare_tulu_data.py` script.

Closes #425.
  • Loading branch information
epwalsh committed Feb 5, 2024
1 parent 6765317 commit 80db5e3
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

### Fixed

- Fixed default value of `--tokenizer` argument to `scripts/prepare_tulu_data.py` to be an absolute path, not relative path, the script can be run from other directories.

## [v0.2.4](https://github.com/allenai/OLMo/releases/tag/v0.2.4) - 2024-02-02

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion scripts/prepare_tulu_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ def get_parser() -> ArgumentParser:
"--tokenizer",
type=str,
help="""Tokenizer path or identifier.""",
default="tokenizers/allenai_eleuther-ai-gpt-neox-20b-pii-special.json",
default=Path(__file__).parent / "tokenizers" / "allenai_eleuther-ai-gpt-neox-20b-pii-special.json",
)
parser.add_argument("-s", "--seq-len", type=int, help="""Max sequence length.""", default=2048)
parser.add_argument("--eos", type=int, help="""EOS token ID.""", default=50279)
Expand Down

0 comments on commit 80db5e3

Please sign in to comment.