Skip to content

agi-dude/pretraining-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This is a simple script to create a pretraining dataset from a folder of input files (txt, md, pdf, docx, epub, html")

Usage Instructions

  1. Clone the repository and navigate to the project directory:

    git clone https://github.com/agi-dude/pretraining-generator
    cd pretraining-generator
  2. Install the required dependencies:

    pip install -r requirements.txt
  3. Run the script:

    python main.py
  4. Follow the GUI prompts to select the input folder and output file.

About

Stolen from augmentoolkit :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages