Skip to content

Project on automatic sound law derivation

Notifications You must be signed in to change notification settings

j-luo93/ASLI

This branch is up to date with djwyen/sound-law-benchmark:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

277ee19 · May 30, 2021
Apr 4, 2021
Apr 4, 2021
Nov 18, 2020
Jan 27, 2021
Apr 19, 2021
May 30, 2021
Apr 18, 2021
Jan 10, 2021
Apr 5, 2021
Apr 15, 2021
Apr 6, 2021
Mar 30, 2021

Repository files navigation

Benchmark Sound Law LSTM

Part of a project that tries to automatically derive sound laws from a list of cognates.

This project uses the ielex dataset as provided in Jäger et al. 2017, "Using support vector machines and state-of-the-art algorithms for phonetic alignment to ientify cognates in multi-lingual wordlists".

Prepare data

  • Obtain NorthEuraLex dataset by running wget http://www.sfs.uni-tuebingen.de/~jdellert/northeuralex/0.9/northeuralex-0.9-forms.tsv.
  • Obtain cognate set dataset and merge it with NorthEuraLex by using wikt_reader library. You would get a family file.
  • Prepare input data by running
python scripts/process_data_wikt.py --data_path <path_to_family_file> --source <src> --targets <tgt_langs> --no_need_transcriber

For instance, for the Germanic language family, run

python scripts/process_data_wikt.py --data_path data/Germanic.tsv --source gem-pro --targets eng deu isl nor swe dan nld --no_need_transcriber

Dependencies

  • various packages in requirements.txt. Run pip install -r requirements.txt.
  • boost packages are needed. On Ubuntu, run sudo apt-get install libboost-all-dev.
  • Install spdlog with the static lib version.

About

Project on automatic sound law derivation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.1%
  • C++ 26.9%