Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Arabic Datasets with respect to Arabic Dialects #103

Open
AliAlsalkhadi opened this issue Nov 15, 2024 · 3 comments
Open

Creating Arabic Datasets with respect to Arabic Dialects #103

AliAlsalkhadi opened this issue Nov 15, 2024 · 3 comments
Assignees

Comments

@AliAlsalkhadi
Copy link

@Mahmoud-s-programs and I went through the articles recommended by @Sepideh-Ahmadian and after a long discussion to find the best way to gather the Arabic datasets with respect to the dialects is by creating different datasets for each region (Gulf, Levantine, Egyptian, Meghrbi). This will encapsulate all Arabic dialects and the model will be able to recognize them.

We have added more reviews to the semEval-2016 dataset already as it uses Gulf dialect exclusively.

Screenshot 2024-11-14 234904

@Sepideh-Ahmadian
Copy link
Member

Thank you @AliAlsalkhadi and @Mahmoud-s-programs! We can discuss it in today's meeting.

@Sepideh-Ahmadian
Copy link
Member

@AliAlsalkhadi and @Mahmoud-s-programs
The article we discussed in the LADy meeting "Datasheets for datasets". Please share your Gmail addresses so I can send you our English draft. In addition to the questions mentioned in the article, feel free to suggest any others that you think are important for our work.
Also access to the full SemEval 2016 dataset is available through this link, which contains a total of 4,802 sentences(only the train dataset).

@AliAlsalkhadi
Copy link
Author

@Sepideh-Ahmadian Thanks for sharing this, here is my gmail: [email protected]
Will check out the full dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants