Skip to content

Latest commit

 

History

History
76 lines (56 loc) · 2.36 KB

README.md

File metadata and controls

76 lines (56 loc) · 2.36 KB

"Let Your Characters Tell Their Story": A Dataset for Character-Centric Narrative Understanding

Dataset: LiSCU

Please follow the instruction here to recreate the LiSCU dataset.

Code

Before you start, make sure you have installed all the requiremenets in the requiremenets.txt.

Note that the code is still being cleaned.

Generative: character description generation

To train character description generator given the summary and character name using different LMs (bartlong, gpt2), run:

sh finetune_[lm].sh
# display argumenets
python -m encoder_decoder_long --help

For gpt2, replace encoder_decoder_long with generative. Note that for gpt2 we have to limit max_input_length=945 and max_output_length=75.

To generate descriptions for a trained model (e.g. bartlong), run:

python generate_texts.py \
	--in_file ../data/new/test.jsonl \
	--out_file char_checkpoints/bart-large-xsum_long/generation/test_prediction_beams5_maxlen_1024.jsonl \
	--model_name_or_path char_checkpoints/bart-large-xsum_long \
	--beams 5 \
	--device 0 \
	--max_length 1024

For gpt2, max_length=75.

Discriminative: character name identification

Here, we first take a generative approach for character name identification, i.e. given a summary and anonymized character description, we generate the character name.

To train model using bartlong, run:

sh finetune_disc_bartlong.sh

NOTE: Add --char_length 50 when needed (for partial description).

To generate character name, run:

python generate_texts.py \
	--in_file ../data/new/discriminative_data/data/v03/test.jsonl \
	--out_file char_checkpoints/bart-large-xsum_long_disc/generation/test_prediction_greedy_maxlen_20.jsonl \
	--model_name_or_path char_checkpoints/bart-large-xsum_long_disc \
	--beams 1 \
	--device 0 \
	--max_length 20 \
	--char_name_last \
	--task discriminative

To compute accuracy of character identification using max probable characters in a set of possible choices, run:

python -m eval.multiple_choice_char_name_gen_new \
	--model_name_or_path char_checkpoints/bart-large-xsum_long_disc \
	--dataset_file ../data/new/test.jsonl \
	--out_dir char_checkpoints/bart-large-xsum_long_disc \
	--device 0
	--format with-choices

#NOTE : used env long2 for longformer and py37 for bart baselines.