Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 1.17 KB

README.md

File metadata and controls

17 lines (12 loc) · 1.17 KB

Novel Abstractive Summarization Techniques

on a variety of corpora using T5, GPT-[?] and BERT

by Brayton Hall

Table of Contents:

Motivation

To use state-of-the-art transformers in novel ways, and quite possibly on novels themselves. Specifically, to test the ways in which transformers with many parameters can be fine-tuned with very little labeled data (i.e. few-shot learning), so that a large language model can learn, by itself, to approximate the task of summarizing, for example, the entirety of War and Peace (or any long form document) with output options along different embeddings to capture desired themes. The potential of this type of few-shot learning, in conjunction with active learning, is enormous.

Current basic output:

Summarization of the entirety of War and Peace with zero-shot chunking summarization.

pic1

The following summary corresponds to the first approximately 5,120 words of War and Peace, or about 1/100 of the novel. This could be adjusted, and fine-tuned upon certain 'themes' or embeddings produced from a bit of manual supervision and active learning.

pic2