From 00855fd1f47c63cdd7099cee8fa14bb32749b984 Mon Sep 17 00:00:00 2001 From: weiqipedia <56439386+weiqipedia@users.noreply.github.com> Date: Thu, 7 Mar 2024 00:29:40 +0800 Subject: [PATCH] Fix link for SEA-LION-Pile in README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0c1b635..29326e3 100644 --- a/README.md +++ b/README.md @@ -76,7 +76,7 @@ SEA-LION has been trained on a diverse dataset of 980B tokens spanning 11 natura - Khmer - Lao -The dataset is available here [SEA-LION-PILE](https://huggingface.co/aisingapore/sea-lion-pile). +The dataset is available here [SEA-LION-PILE](https://huggingface.co/datasets/aisingapore/sea-lion-pile). The models use a vocabulary of 256,000 tokens and a context length of 2048 tokens. For tokenization, the model employs a custom SEA byte-pair encoding (BPE) tokenizer which is specially tailored for SEA languages, ensuring optimal model performance.