Transformer-Based Text Summarization

Overview

This project implements a transformer model for text summarization using TensorFlow, inspired by Google's seminal 2017 paper "Attention Is All You Need". The model uses an encoder-decoder architecture with self-attention mechanisms to generate abstractive summaries of input text.

Model Architecture

The transformer model consists of:

Encoder: Processes the input sequence
Decoder: Generates the output summary
Multi-head attention layers: Allow the model to focus on different parts of the input
Feed-forward neural networks: Process the attention output
Layer normalization: Stabilizes the learning process

Key Components

Positional encoding to maintain sequence order information
Masked self-attention in the decoder to prevent looking ahead
Scaled dot-product attention as the core attention mechanism

Data Preprocessing

document, summary = utils.preprocess(train_data)
document_test, summary_test = utils.preprocess(test_data)

tokenizer = tf.keras.preprocessing.text.Tokenizer(filters=filters, oov_token=oov_token, lower=False)
tokenizer.fit_on_texts(documents_and_summary)

inputs = tokenizer.texts_to_sequences(document)
targets = tokenizer.texts_to_sequences(summary)

Training

# Initialize model
transformer = Transformer(num_layers, embedding_dim, num_heads, fully_connected_dim,
                          vocab_size, vocab_size, max_pos_encoding, max_pos_encoding)

# Training loop
for epoch in range(epochs):
    for batch, (inp, tar) in enumerate(dataset):
        train_step(transformer, inp, tar)
        
    print(f'Epoch {epoch+1}, Loss {train_loss.result():.4f}')

Inference

def summarize(model, input_document):
    input_document = tokenizer.texts_to_sequences([input_document])
    input_document = tf.keras.preprocessing.sequence.pad_sequences(input_document, maxlen=encoder_maxlen, padding='post', truncating='post')
    encoder_input = tf.expand_dims(input_document[0], 0)
    
    output = tf.expand_dims([tokenizer.word_index["[SOS]"]], 0)
    
    for i in range(decoder_maxlen):
        predicted_id = next_word(model, encoder_input, output)
        output = tf.concat([output, predicted_id], axis=-1)
        
        if predicted_id == tokenizer.word_index["[EOS]"]:
            break

    return tokenizer.sequences_to_texts(output.numpy())[0]

# Generate summary
summary = summarize(transformer, input_document)

Key Functions

positional_encoding(positions, d_model): Generates positional encodings
create_padding_mask(decoder_token_ids): Creates mask for padding tokens
create_look_ahead_mask(sequence_length): Creates mask to prevent looking ahead
scaled_dot_product_attention(q, k, v, mask): Computes attention weights
EncoderLayer and DecoderLayer: Main components of the transformer
Transformer: Combines encoder and decoder into full model
train_step(model, inp, tar): Performs one training step
next_word(model, encoder_input, output): Predicts the next word in the summary
summarize(model, input_document): Generates a complete summary

Custom Learning Rate Scheduler

class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, d_model, warmup_steps=4000):
        super(CustomSchedule, self).__init__()
        self.d_model = tf.cast(d_model, dtype=tf.float32)
        self.warmup_steps = warmup_steps
    
    def __call__(self, step):
        step = tf.cast(step, dtype=tf.float32)
        arg1 = tf.math.rsqrt(step)
        arg2 = step * (self.warmup_steps ** -1.5)

        return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data/corpus		data/corpus
README.md		README.md
transformer_text_summarization.ipynb		transformer_text_summarization.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer-Based Text Summarization

Overview

Table of Contents

Model Architecture

Key Components

Data Preprocessing

Training

Inference

Key Functions

Custom Learning Rate Scheduler

Results

About

Releases

Packages

Languages

TrishamBP/text-summarization-transformer

Folders and files

Latest commit

History

Repository files navigation

Transformer-Based Text Summarization

Overview

Table of Contents

Model Architecture

Key Components

Data Preprocessing

Training

Inference

Key Functions

Custom Learning Rate Scheduler

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages