Skip to content

ntuspeechlab/LiuChangsong_FYP2024_SUD

 
 

Repository files navigation

FYP-Project

This repository contains the Final Year Project aimed at restoring punctuation using Large Language Models (LLMs) and pre-LLMs (small-scale LLMs) for speech transcripts. The project is divided into two main parts: LLM and pre-LLM.

Table of Contents

Introduction

The FYP project focuses on restoring punctuation in text using different language models. The project is divided into two main sections:

  • LLM: This section involves fine-tuning the Llama-2 model with LoRA using the LibriHeavy small dataset and comparing with the Gemini-pro performance using API.
  • pre-LLM: This section involves using the modified XLM-RoBERTa model for punctuation restoration.

Project Structure

The repository is organized into the following directories:

  • LLM: Contains scripts and resources for fine-tuning and testing the Llama-2 model.
  • pre-LLM: Contains scripts and resources for using the XLM-RoBERTa model for punctuation restoration.

Please refer to the readme files in LLM and pre-LLM for more detailed instruction.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.3%
  • Python 9.3%
  • Other 0.4%