Skip to content

The Arabic humor dataset was collected using Twint and Sketch Engine and it consists of 10k tweets.

Notifications You must be signed in to change notification settings

iwan-rg/Arabic-Humor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

A Dataset for Detecting Humor in Arabic Text

Humor detection is a complex and ambiguous task in natural language processing. This has made automatic humor detection challenging, particularly for languages with limited resources such as Arabic. In this paper, we attempt to solve this task by collecting and annotating Arabic humorous tweets (dialects) and Modern Standard Arabic (MSA) text then performing automatic humor detection on the collected data. We experimented on the collected dataset by fine-tuning seven Arabic Pre-Trained language models which are: AraBERTv02, Arabertv02-twitter, QARIB, MarBERT, MARBERTv2, CAMeLBERT-DA, and CAMeLBERT-MIX to establish a baseline classification system. We concluded that CAMeLBERT-DA was the best-performing model and it achieved an F1-score and accuracy of 72.11%.

File Specifications

  • humor.tsv : File that contains tweets with two labels, "humor" and "non-humor"

Citation

If you use this dataset please cite as:

@inproceedings{[Al-Khalifa et al., 2022],
  title={A Dataset for Detecting Humor in Arabic Text},
  author={Hend Al-Khalifa, Fetoun AlZahrani, Hala Qawara, Reema AlRowais, Sawsan Alowa  and Luluh AlDhubayi},
  booktitle={The 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)},
  year={2022}
}

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

About

The Arabic humor dataset was collected using Twint and Sketch Engine and it consists of 10k tweets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published