Skip to content

andrecianflone/seq2seq_drr

Repository files navigation

About

DRR with encoder/decoder type model

CoNLL

PDTB

Dataset breakdown

The Pitler et al 2009 breakdown:

Set WSJ sections Temporal Contingency Comparison Expansion EntRel
Training 2-20
Development 0-1, optionally can use 23-24
Test 21-22

Followed by, for example: Zhang et al 2015, Chen et al, 2016, [Ji and Eisenstein, 2015]

The CoNLL breakdown, recommended by the original PDTB 2.0 corpus:

Set WSJ sections
Training 2-21
Development 22
Test 23

Followed by CoNLL, Wang and Lan, 2016

Types

According to the official PDTB summary:

PDTB Relations No. of tokens
Explicit 18459
Implicit 16224
AltLex 624
EntRel 5210
NoRel 254
Total 40600

Relations

CoNLL version classifies the lower 16 levels, and includes EntRel.

Top-level breakdown:

Top Level Explicit (18459) Implicit (16224) AltLex (624) Total
TEMPORAL 3612 950 88 4650
CONTINGENCY 3581 4185 276 8042
COMPARISON 5516 2832 46 8394
EXPANSION 6424 8861 221 15506
Total 19133 16828 634 36592

1st level, one-v-all

For higher level classification, such as in Chen et al, 2016, they experiment with one-v-all with negative sampling from section 2-20. They use the Pitler breakdown and merge EntRel with Expansion.

GRN

Gated Relevance Network. Summary:

  • BiLSTM + GRN + Pooling + MLP
  • Embedding: 50D, by Turian et al (2010) (not available online)
  • Embeddings fixed during training
  • Use only top 10k word by frequency
  • All text are set to 50 words
  • Parameters init between [-0.1, 0.1]

Results:

PDTB, top-level, Implicit, EntRel as Expansion

Type Author Comparison Contingency Expansion Temporal
Pitler et al., 2009 21.96% 47.13% 76.42% 16.76%
Zhou et al., 2010 31.79% 47.16% 70.11% 20.30%
Park and Cardie, 2012 31.32% 49.82% 79.22% 26.57%
Rutherford and Xue, 2014 39.70% 54.42% 80.44% 28.69%
Ji and Eisenstein, 2015 35.93% 52.78% 80.02% 27.63%
LSTM Chen et al, 2016 31.78% 45.39% 75.10% 19.65%
Bi-LSTM + GRN Chen et al, 2016 40.17% 54.76% 80.62% 31.32%

PDTB, top-level, Implicit, no EntRel

Type Author Comparison Contingency Expansion Temporal
Shallow CNN Zhang et al 2015 33.22% 52.04% 69.59% 30.54%

CoNLL English dataset (PDTB), low-level, Implicit F1 score

ID Blind Test Dev
aarjay 9.95 15.6 36.85
BIT 19.3 16.5 17.36
clac 27.7 28.1 37.12
ecnucs 34.1 40.9 46.42
goethe 31.8 37.6 45.42
gtnlp 36.7 34.9 40.72
gw0 33.0 30.2 34.58
gw0 21.2 18.5 35.11
nguyenlab 31.4 28.8 34.31
oslopots 33.8 33.7 43.12
PurdueNLP 29.1 34.4 38.05
steven 23.5 20.5 26.68
tao0920 35.3 38.2 46.33
tbmihaylov 34.5 39.1 40.32
ykido 32.3 22.6 29.11
ttr 37.6 36.1 40.32

Other refs

About

Encoder Decoder for DRR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published