Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add French and Chinese Wiktionary JSON schema #356

Merged
merged 6 commits into from
Oct 10, 2023

Conversation

xxyzz
Copy link
Collaborator

@xxyzz xxyzz commented Oct 9, 2023

I'm run the validation code but it is not finish yet, the schema might not match the JSON file.

@xxyzz
Copy link
Collaborator Author

xxyzz commented Oct 9, 2023

The script is finished successfully. But it takes 30 minutes, almost the same as creating the JSON file.

@xxyzz xxyzz changed the title Add French Wiktionary JSON schema Add French and Chinese Wiktionary JSON schema Oct 10, 2023
@kristian-clausal
Copy link
Collaborator

Tatu wants that all of the json data produced by wiktextract should be as close to each other as possible, so any schemas should probably be also applied to all other extraction output. We will have to consolidate at some point (when we create kaikki.org dictionaries for the French and Chinese wiktionaries etc. at some point), which probably means learning from these newer schemas and applying changes to the en.wiktionary output, and then checking for divergences that might have slipped in by accident.

@xxyzz
Copy link
Collaborator Author

xxyzz commented Oct 10, 2023

The main structures are the same as the English JSON data, but both French and Chinese Wiktionary have their own unique extra data, like French Wiktionary has traditional writing form in translation list and Chinese Wiktionary has two forms of example sentences. Some parts of the schemas might be incompatible, but the validation script would still pass because I didn't add any required property.

@xxyzz
Copy link
Collaborator Author

xxyzz commented Oct 10, 2023

Chinese Wiktionary JSON schema validation also passed.

@xxyzz xxyzz merged commit b4a54f7 into tatuylonen:master Oct 10, 2023
@xxyzz xxyzz deleted the json_schema branch October 10, 2023 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants