Skip to content

Data and models for sentencepiece experiments with texts from the Kanseki Repository

Notifications You must be signed in to change notification settings

cwittern/krp_sp

Repository files navigation

Experiments with Kanseki Repository and sentencepiece

Install

  • The python scripts use version 3.5.1
  • install dependencies
pip install -r requirements.txt
  • make the corpus files
bash scripts/makecorpus.sh

UPDATE: Since GitHub rejected the corpus files as too large, they are not included here. Instructions to create the corpus in scripts/mkcorpus.py

About

Data and models for sentencepiece experiments with texts from the Kanseki Repository

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published