LLM Dataset Generator

Description

This project is designed to generate dataset from a corpus for training/evaluating LLMs. It utilises OpenAI's GPT models.

Installation

Follow the instructions below to get the project up and running on your local machine:

Clone the repository: git clone https://github.com/Perseus14/llm-dataset-generator.git
Navigate to the project directory: cd llm-dataset-generator
Create virtual environment: virtualenv -p /usr/bin/python3 venv
Activate virtual environment: source venv/bin/activate
Install dependencies: pip install -r requirements.txt

Usage

To use this project, follow these steps:

Create a folder and and the corpus to that folder (Supports txt and pdf files)
Create a .env file and add openAI apikey (Similar to .env_local)
Modify main.py and the required paths

Pending Tasks

Generate conversational dataset
Add other LLM models
Modify to provide more control to users

Contributing

Contributions are welcome! Follow the steps below to contribute to this project:

Fork the repository
Create a new branch: git checkout -b new-feature
Make your changes and commit them: git commit -m 'Add new feature'
Push the changes to your forked repository: git push origin new-feature
Open a pull request on the original repository

Please ensure that your contributions align with the project's coding style and guidelines.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
datasets		datasets
evaluate		evaluate
examples/corpus_folder		examples/corpus_folder
llmdg		llmdg
results		results
.env_local		.env_local
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Dataset Generator

Description

Installation

Usage

Pending Tasks

Contributing

License

About

Releases

Packages

Languages

Perseus14/llm-dataset-generator

Folders and files

Latest commit

History

Repository files navigation

LLM Dataset Generator

Description

Installation

Usage

Pending Tasks

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages