Skip to content

Latest commit

 

History

History
40 lines (21 loc) · 2.19 KB

README.md

File metadata and controls

40 lines (21 loc) · 2.19 KB

VarDial2024

Screenshot 2024-06-11 at 14 15 58

This repository contains code and data to run experiments discussed in the paper presented at VarDial, NAACL 2024 (June), by Verkijk, Sommerauer and Vossen, as well as a collection of annotated data presented in the same paper (Studying Language Variation Considering the Re-Usability of Modern Theories, Tools and Resources for Annotating Explicit and Implicit Events in Cnturies Old Text).This work is part of the GLOBALISE project.

annotated_data

This folder contains all annotated data collected thus far for event detection and classifcation within GLOBALISE.

  • train
    • train_2 Documents annotated by trained annotators in Round 2 as described in the paper - 54 pages
    • train_3 Documents annotated in Round 3 as described in the paper - 57 pages
  • test
    • curated One document annoated and subsequently curated by four historians and a linguist - 5 pages
    • non-curated Two documents, one annotated in Round 2 and one in Round 3, annotated by two and four annotator teams respectively, that are to be curated to serve as an addition to the test set. 13 pages

The documents included in non-curated are also those used for calculating the IAA. The documents included in train_2 are also used in the LLM-finetuning experiment, where this data is split in train and test.

Overview w/ metadata: Screenshot 2024-06-17 at 01 36 45

zero-shot_experiments

This folder contains code and data to reproduce the zero-shot experiments presented in the paper.

LLM-finetuning

This folder contains code and data to fine-tune (L)LMs on the task of event detection as described in the paper. The code was written to be run on an HPC cluster. The original code written for NER is by Sophie Arnoult.

Screenshot 2024-06-12 at 19 18 16