Skip to content

IDRISI is the largest-scale publicly-available Twitter Location Mention Prediction (LMP) datasets, in both English and Arabic languages. It contains 41 disaster events of different types (e.g., floods, fires). Annotations include tagged LMs in posts, location types (e.g., cities, streets), links to OSM toponyms, & usefulness of features for LMD.

Notifications You must be signed in to change notification settings

rsuwaileh/IDRISI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

IDRISI

IDRISI is the largest-scale publicly-available Twitter Location Mention Prediction (LMP) dataset, in both English and Arabic languages. Named after Muhammad Al-Idrisi👳🏻‍♂️, who is one of the pioneers and founders of the advanced geography.

All datasets are licensed under Creative Commons Attribution 4.0 International License.

  • The Location Mention Recognition (LMR) datasets are under LMR directory.
  • The Location Mention Disambiguation (LMD) datasets will be available soon under LMD directory.

We processed the data to de-identify it as follows:

  • We do not release the the user identifiers, i.e., user_id.
  • We replace the user mentions (i.e.,@) in the tweet text by @0 of the same length as the mention length. For example, if the mention is @someuser, we replace it with @00000000.
  • We keep the tweet ids, i.e., id, to allow recrawling tweets for extracting more information, e.g., meta data, social network properties, etc. This allows developing LMP models that utilize different features beyond the textual content.

For any inqueries, please create a new issue in the repository or contact us via email:

Publications

  @article{rsuwaileh2023idrisire,
    title = {IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets},
    author = {Reem Suwaileh and Tamer Elsayed and Muhammad Imran},
    journal = {Information Processing & Management},
    volume = {60},
    number = {3},
    pages = {103340},
    year = {2023},
    issn = {0306-4573},
    doi = {https://doi.org/10.1016/j.ipm.2023.103340},
    url = {https://www.sciencedirect.com/science/article/pii/S0306457323000778},
    publisher={Elsevier}
  }

  
  @inprocessdings{rsuwaileh2023idrisira,
    title = {{IDRISI}-{RA}: The First {A}rabic Location Mention Recognition Dataset of Disaster Tweets},
    author = {Suwaileh, Reem  and Imran, Muhammad and Elsayed, Tamer},
    booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
    month = jul,
    year = {2023},
    address = {Toronto, Canada},
    publisher = {Association for Computational Linguistics},
    url = {https://aclanthology.org/2023.acl-long.901},
    doi = {10.18653/v1/2023.acl-long.901},
    pages = {16298--16317}
  }
  
    

Acknowledgments

This work was made possible by the Graduate Sponsorship Research Award (GSRA) #GSRA5-1-0527-18082 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

About

IDRISI is the largest-scale publicly-available Twitter Location Mention Prediction (LMP) datasets, in both English and Arabic languages. It contains 41 disaster events of different types (e.g., floods, fires). Annotations include tagged LMs in posts, location types (e.g., cities, streets), links to OSM toponyms, & usefulness of features for LMD.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published