Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update and add DISTAM-Calfa non-Latin scripts datasets: two updates (Arabic) and two new additions (Arabic, Chinese) #172

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

CVidalG
Copy link
Contributor

@CVidalG CVidalG commented Jan 9, 2025

Hi,

Update of Arabic datasets: RASAM (moved to RASAM 1) and Tarima
Add two new datasets: RASAM 2 (Arabic) and Chi Know Po (Chinese)

Thanks!

@alix-tz
Copy link
Member

alix-tz commented Jan 13, 2025

Hello Chahan,
Thank you for the updates and additions.

For RASAM 1 and RASAM 2, are they two versions of the "same" dataset (with different metadata for authorities?) - if so, it might open up to a larger discussion about versions.

@CVidalG
Copy link
Contributor Author

CVidalG commented Jan 14, 2025

Hi Alix,

Thanks for your feedback. Indeed, RASAM 2 can be considered as an extension of RASAM 1 (they share the same git repository now). Within the scope of a new hackathon, we have added (RASAM 2) 250 pages from 15 different manuscripts to the 300 existing pages (from 3 manuscripts) of RASAM 1. Both solutions suit me well. In our last paper, we have distinguished RASAM 1 and RASAM 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants