Skip to content

Commit

Permalink
Merge branch 'main' of github.com:aisingapore/sealion into sft_refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
yongxb committed Mar 12, 2024
2 parents f4fa5be + 00855fd commit cdb7ae4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ SEA-LION has been trained on a diverse dataset of 980B tokens spanning 11 natura
- Khmer
- Lao

The dataset is available here [SEA-LION-PILE](https://huggingface.co/aisingapore/sea-lion-pile).
The dataset is available here [SEA-LION-PILE](https://huggingface.co/datasets/aisingapore/sea-lion-pile).

The models use a vocabulary of 256,000 tokens and a context length of 2048 tokens. For tokenization, the model employs a custom SEA byte-pair encoding (BPE) tokenizer which is specially tailored for SEA languages, ensuring optimal model performance.

Expand Down

0 comments on commit cdb7ae4

Please sign in to comment.