Code for paper Get To The Point Summarization with Pointer-Generator Networks implemented by FastNLP. Source code: https://github.com/abisee/pointer-generator
The type of input data needs to be jsonl. The input file contains two keys:
- text: original text
- summary: abstract of the text
The type of value needs to be list.
E.g.
- "text": ["london -lrb- cnn -rrb- a 19-year-old man was charged wednesday with terror offenses after he was arrested as he returned to britain from turkey , london 's metropolitan police said .", "yahya rashid , a uk national from northwest london , was detained at luton airport on tuesday after he arrived on a flight from istanbul , police said .", "he 's been charged with engaging in conduct in preparation of acts of terrorism , and with engaging in conduct with the intention of assisting others to commit acts of terrorism . both charges relate to the period between november 1 and march 31 .", "rashid is due to appear in westminster magistrates ' court on wednesday , police said .", "cnn 's lindsay isaac contributed to this report ."]
- "summary": ["london 's metropolitan police say the man was arrested at luton airport after landing on a flight from istanbul .", "he 's been charged with terror offenses allegedly committed since the start of november ."]
Command line for training:
python train.py -train_data_path TRAIN_DATA_PATH -eval_data_path VALID_DATA_PATH -log_root LOG_ROOT_NAME -is_pointer_gen -is_coverage -n_epochs 33 -visible_gpu 0 -lr_coverage 0.025 -batch_size 16
- TRAIN_DATA_PATH: path of the train set
- VALID_DATA_PATH: path of the validation set
- LOG_ROOT_NAME: path to save the trained model
- is_pointer_gen: whether to use pointer
- is_coverage: whether to use coverage
Command line for testing:
python decode.py -decode_data_path TEST_DATA_PATH -train_data_path TRAIN_DATA_PATH -test_model CHECKPOINT -log_root LOG_ROOT_NAME -is_pointer_gen -is_coverage -test_data_name TEST_DATA_NAME -visible_gpu 0
- TEST_DATA_PATH: path of the test set
- TRAIN_DATA_PATH: path of the train set
- CHECKPOINT: path of the checkpoint
- LOG_ROOT_NAME: path to save the decoded result
- TEST_DATA_NAME: name of the test set
- is_pointer_gen and is_coverage need to be the same as training.