Skip to content

A Python implementation of a transformer model for text summarization using TensorFlow, inspired by Google's seminal 2017 paper "Attention Is All You Need".

Notifications You must be signed in to change notification settings

TrishamBP/text-summarization-transformer

Repository files navigation

Transformer-Based Text Summarization

Overview

This project implements a transformer model for text summarization using TensorFlow, inspired by Google's seminal 2017 paper "Attention Is All You Need". The model uses an encoder-decoder architecture with self-attention mechanisms to generate abstractive summaries of input text.

Table of Contents

  1. Model Architecture
  2. Key Components
  3. Installation
  4. Data Preprocessing
  5. Training
  6. Inference
  7. Key Functions
  8. Custom Learning Rate Scheduler
  9. Results

Model Architecture

The transformer model consists of:

  • Encoder: Processes the input sequence image
  • Decoder: Generates the output summary image
  • Multi-head attention layers: Allow the model to focus on different parts of the input
  • Feed-forward neural networks: Process the attention output
  • Layer normalization: Stabilizes the learning process image

Key Components

  • Positional encoding to maintain sequence order information
  • Masked self-attention in the decoder to prevent looking ahead
  • Scaled dot-product attention as the core attention mechanism

Data Preprocessing

document, summary = utils.preprocess(train_data)
document_test, summary_test = utils.preprocess(test_data)

tokenizer = tf.keras.preprocessing.text.Tokenizer(filters=filters, oov_token=oov_token, lower=False)
tokenizer.fit_on_texts(documents_and_summary)

inputs = tokenizer.texts_to_sequences(document)
targets = tokenizer.texts_to_sequences(summary)

Training

# Initialize model
transformer = Transformer(num_layers, embedding_dim, num_heads, fully_connected_dim,
                          vocab_size, vocab_size, max_pos_encoding, max_pos_encoding)

# Training loop
for epoch in range(epochs):
    for batch, (inp, tar) in enumerate(dataset):
        train_step(transformer, inp, tar)
        
    print(f'Epoch {epoch+1}, Loss {train_loss.result():.4f}')

Inference

def summarize(model, input_document):
    input_document = tokenizer.texts_to_sequences([input_document])
    input_document = tf.keras.preprocessing.sequence.pad_sequences(input_document, maxlen=encoder_maxlen, padding='post', truncating='post')
    encoder_input = tf.expand_dims(input_document[0], 0)
    
    output = tf.expand_dims([tokenizer.word_index["[SOS]"]], 0)
    
    for i in range(decoder_maxlen):
        predicted_id = next_word(model, encoder_input, output)
        output = tf.concat([output, predicted_id], axis=-1)
        
        if predicted_id == tokenizer.word_index["[EOS]"]:
            break

    return tokenizer.sequences_to_texts(output.numpy())[0]

# Generate summary
summary = summarize(transformer, input_document)

Key Functions

  • positional_encoding(positions, d_model): Generates positional encodings
  • create_padding_mask(decoder_token_ids): Creates mask for padding tokens
  • create_look_ahead_mask(sequence_length): Creates mask to prevent looking ahead
  • scaled_dot_product_attention(q, k, v, mask): Computes attention weights
  • EncoderLayer and DecoderLayer: Main components of the transformer
  • Transformer: Combines encoder and decoder into full model
  • train_step(model, inp, tar): Performs one training step
  • next_word(model, encoder_input, output): Predicts the next word in the summary
  • summarize(model, input_document): Generates a complete summary

Custom Learning Rate Scheduler

class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, d_model, warmup_steps=4000):
        super(CustomSchedule, self).__init__()
        self.d_model = tf.cast(d_model, dtype=tf.float32)
        self.warmup_steps = warmup_steps
    
    def __call__(self, step):
        step = tf.cast(step, dtype=tf.float32)
        arg1 = tf.math.rsqrt(step)
        arg2 = step * (self.warmup_steps ** -1.5)

        return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)

Results

image

About

A Python implementation of a transformer model for text summarization using TensorFlow, inspired by Google's seminal 2017 paper "Attention Is All You Need".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published