This is the repository of the paper Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders.
BiGAE, a novel graph pre-training auto-encoder, explicitly models intra-sentential distinctive features and inter-sentential cohesive features through sentence-word bipartite graphs, achieving superior performance in unsupervised summarization frameworks by generating summary-worthy sentence representations that outperform heavy BERT- or RoBERTa-based embeddings in downstream tasks.
Run command below to install all the environment in need(using python3)
pip install -r requirements.txt
The pyrouge package requires additional installation procedures. If you need to run the extractive summarization task, please refer this site to install pyrouge.
We provide all datasets used in our experiments:
- The datasets used for upstream and downstream tasks are CNN/DailyMail and Multi-News. Please unzip the downloaded file and replace the empty
./data
folder.
Generate the pre-trained model by executing the following shell script. (Before running, change the "data_path" augument in the script as needed)
bash pretraining/scripts/run.sh
After running finished, execute the following script to test generated pre-trained model.
bash pretraining/scripts/run.sh
Using the pre-trained model generated in upstream task, we conduct text summarization. We carry out the tasks by running the following shell script.
bash summarization/script/test.sh
During the experiments, we employed 4 different text summarization methods (we set DASG as default). These methods could be changed in summarization/src_gae/trainer.py
. For tasks using different datasets, the "data_path" and "model_path" arguments in test.sh
need to be adjusted.
All the example scripts can be found in ./script
@inproceedings{DBLP:conf/emnlp/MaoZLGHLL23,
author = {Qianren Mao and
Shaobo Zhao and
Jiarui Li and
Xiaolei Gu and
Shizhu He and
Bo Li and
Jianxin Li},
editor = {Houda Bouamor and
Juan Pino and
Kalika Bali},
title = {Bipartite Graph Pre-training for Unsupervised Extractive Summarization
with Graph Convolutional Auto-Encoders},
booktitle = {Findings of the Association for Computational Linguistics: {EMNLP}
2023, Singapore, December 6-10, 2023},
pages = {4929--4941},
publisher = {Association for Computational Linguistics},
year = {2023},
url = {https://aclanthology.org/2023.findings-emnlp.328},
timestamp = {Wed, 13 Dec 2023 17:20:20 +0100},
biburl = {https://dblp.org/rec/conf/emnlp/MaoZLGHLL23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}