This repository contains the Final Year Project aimed at restoring punctuation using Large Language Models (LLMs) and pre-LLMs (small-scale LLMs) for speech transcripts. The project is divided into two main parts: LLM and pre-LLM.
The FYP project focuses on restoring punctuation in text using different language models. The project is divided into two main sections:
- LLM: This section involves fine-tuning the Llama-2 model with LoRA using the LibriHeavy small dataset and comparing with the Gemini-pro performance using API.
- pre-LLM: This section involves using the modified XLM-RoBERTa model for punctuation restoration.
The repository is organized into the following directories:
LLM
: Contains scripts and resources for fine-tuning and testing the Llama-2 model.pre-LLM
: Contains scripts and resources for using the XLM-RoBERTa model for punctuation restoration.
Please refer to the readme files in LLM and pre-LLM for more detailed instruction.