This repository implements an online demo of the paper Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality published in the Second Scholarly Document Processing Workshop (SDProc 2021) at NAACL-HLT 2021.
This project is based on the Flask framework. Detailed explanations about how to use Flask and the configuration files of this project can be found in the excellent Mega Flask tutorial. To start the server, you just need to run the following command in the root folder of this project:
flask run
Assuming you have your whole document in a single string, using
from auto_summ.engine.core.engine_summarization import algorithm
centralities = algorithm(text)
will parse it into sentences, compute the centrality of each one of them according to the algorithm described in the paper and give you back a Pandas dataframe with the following columns:
- sentence, which contains the sentences found in your document.
- centrality, which contains the relevance score (essentially the degree centrality) of each one of the sentences.
- A detailed sentence tokenization process based on regular expressions than can accurately handle most cases found in scientific literature.
- As opposed to the implementation of the paper, this online implementation runs on TF-IDF embeddings for the sake of speed and efficiency. You can easily change this to any of the pre-trained language models found in https://www.sbert.net/.
git clone https://github.com/jarobyte91/auto_summ.git
cd auto_summ
pip install -r requirements.txt
Feel free to send an email to [email protected] or contact me through any of my social media.
Feel free to use the Issue Tracker or Pull Requests of this repository.
This project is licensed under the MIT License.