Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality

This repository implements an online demo of the paper Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality published in the Second Scholarly Document Processing Workshop (SDProc 2021) at NAACL-HLT 2021.

Usage

Starting the server

This project is based on the Flask framework. Detailed explanations about how to use Flask and the configuration files of this project can be found in the excellent Mega Flask tutorial. To start the server, you just need to run the following command in the root folder of this project:

flask run

Using as a library

Assuming you have your whole document in a single string, using

from auto_summ.engine.core.engine_summarization import algorithm

centralities = algorithm(text)

will parse it into sentences, compute the centrality of each one of them according to the algorithm described in the paper and give you back a Pandas dataframe with the following columns:

sentence, which contains the sentences found in your document.
centrality, which contains the relevance score (essentially the degree centrality) of each one of the sentences.

Features

A detailed sentence tokenization process based on regular expressions than can accurately handle most cases found in scientific literature.
As opposed to the implementation of the paper, this online implementation runs on TF-IDF embeddings for the sake of speed and efficiency. You can easily change this to any of the pre-trained language models found in https://www.sbert.net/.

Installation

git clone https://github.com/jarobyte91/auto_summ.git
cd auto_summ
pip install -r requirements.txt

Support

Feel free to send an email to [email protected] or contact me through any of my social media.

Contribute

Feel free to use the Issue Tracker or Pull Requests of this repository.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
engine		engine
.flaskenv		.flaskenv
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.db		app.db
auto_summ.py		auto_summ.py
config.py		config.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality

Usage

Starting the server

Using as a library

Features

Installation

Support

Contribute

License

About

Releases

Packages

Languages

License

jarobyte91/auto_summ

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Document Summarization using pre-Trained Sentence Embeddings and Graph Centrality

Usage

Starting the server

Using as a library

Features

Installation

Support

Contribute

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages