In the Computational Approaches to Modeling Language (CAMeL) Lab we work on the development of a wide range of Arabic and Arabic dialect resources (tools, corpora and lexicons). One goal we hold high is to follow consistent standards for all of our resources. Of course, working with Arabic dialects comes with many challenges, as they are resource poor and have no official standards. Our overall approach to annotation guidelines of Arabic and its dialects, is to create common standards that are compatible with Modern Standard Arabic but easily and naturally extended to the various dialects.
In this site, we provide our guidelines for representing:
- Phonology (how words are pronounced)
- Orthography (how words are written)
- Morphology (how are words put together)
- Syntax (how words come together to form sentences)
The guidelines are versioned and backed up on GitHub. We invite you to check them out, and give your feedback. Each guideline section includes a discussion of high level philosophy as well as specific details, and links to publications on the guidelines and publications and projects using the guidelines.