WIP. Additional information and code will be added soon.
Automatically Scraped Hard News Event Extraction dataset.
The dataset contains 
| Event Type | #Documents | Event Type | #Documents | 
|---|---|---|---|
| Air crash | 55 | Mass Poisoning | 7 | 
| Armed Conflict | 76 | Military Exercise | 70 | 
| Bank Robbery | 7 | Mine Collapses | 4 | 
| Disease Outbreaks | 59 | Mudslides | 21 | 
| Droughts | 18 | Other | 1229 | 
| Earthquakes | 56 | Protest_Online Condemnation | 68 | 
| Environment Pollution | 39 | Regime Change | 2 | 
| Famine | 12 | Riot | 16 | 
| Financial Crisis | 27 | Road Crash | 86 | 
| Fire | 77 | Shipwreck | 37 | 
| Floods | 84 | Strike | 65 | 
| Gas explosion | 23 | Train collisions | 6 | 
| Hurricanes_Tornado_Storm_Blizzard | 98 | Tsunamis | 0 | 
| Insect Disaster | 24 | Volcano Eruption | 13 | 
For majority of articles you can find the url in the ashnee_url.csv file.
Articles were mainly scraped from the following portals/domains: dailymail.co.uk, thewest.com.au, bbc.com, *allafrica.com, thetimes.co.uk, nzherald.co.nz, indiatimes.com, sputniknews.com, indepedent.co.uk, 9news.com.au, inquirer.net, theguardian.com, mb.com.ph, punchng.com, thestar.com.my, sott.net, and news.com.au.
Most articles were published between 2019. and 2022.
List of models we fine-tuned for event detection: roberta-base, roberta-large, deberta-v3-base, deberta-large, distilroberta-base, and albert-base-v2.
List of models we fine-tuned for argument extraction: roberta-base, roberta-large, deberta-v3-base, deberta-v3-large, distilroberta-base, and albert-base-v2.