Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the word-level version closed? #2

Open
YuhuYang opened this issue Nov 14, 2023 · 2 comments
Open

Is the word-level version closed? #2

YuhuYang opened this issue Nov 14, 2023 · 2 comments

Comments

@YuhuYang
Copy link

YuhuYang commented Nov 14, 2023

I cannot find the page of SUD_Chinese_Beginner-Word. And I want to know whether this version is auto transferred or manually transferred.

@kirianguiller
Copy link
Collaborator

kirianguiller commented Nov 14, 2023

SUD_Chinese_Beginner-Word (real name is SUD_Chinese_Beginner, in opposition with mSUD_Chinese_Beginner) is automatically converted from the mSUD version by fusionning all tokens that share a morphological /m relation (我 -m-> 们 => 我们).

There are no automatic mSUD repo at the moment, but I just created this script to help converting from mSUD to SUD.

We need to discuss with other SUD maintainers to see what would be the best behavior for storing the SUD version (@bguil )

Meanwhile, here is the converted SUD version inside a single file :
zh_beginner-sud-test.conllu.txt
(just remove the .txt extension).

Thank you for your interest !

@YuhuYang
Copy link
Author

SUD_Chinese_Beginner-Word (real name is SUD_Chinese_Beginner, in opposition with mSUD_Chinese_Beginner) is automatically converted from the mSUD version by fusionning all tokens that share a morphological /m relation (我 -m-> 们 => 我们).

目前没有自动 mSUD 存储库, but I just created this script to help converting from mSUD to SUD.

We need to discuss with other SUD maintainers to see what would be the best behavior for storing the SUD version (@bguil )

Meanwhile, here is the converted SUD version inside a single file : zh_beginner-sud-test.conllu.txt (just remove the .txt extension).

Thank you for your interest !

Thanks for your kind heart! mSUD is really an interesting project (for some scholars argued that Chinese char is equivalent to word). This project is a good material to test it. Waiting for your well-finished corpus. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants