Skip to content

convert github repos or local folders into a single text document that can be used for LLM training, RAG or copy pasting into context while developing

Notifications You must be signed in to change notification settings

SelfishGene/repo_to_text

Repository files navigation

Repo to Text

Repo to Text is a Python tool that converts GitHub repositories and local folders into text format to be used for either training LLMs or using copy pasting into LLM context. this repo provids AI-powered analysis of images, CSV files, and Jupyter notebooks.

Features

  • Process GitHub repositories and local folders and create a single text file for each repository
  • Convert images to text by describing them using Google's Gemini Flash model (fast and free for up to 15 requests per minute)
  • Covert Jupyter notebooks to text where the input (code/markdown) & output of each cell is clearly delimitered (if output contains images, they are also described using Flash)
  • Describe large CSV files also using Flash instead of just dumping their entire content
  • Handle API rate limits with built-in retry logic

Installation

For local development, clone the repository and install it in editable mode:

git clone https://github.com/SelfishGene/repo_to_text.git
cd repo_to_text
pip install -e .

Usage

To use Repo to Text, you need to set up your API keys first.
The following code will create a .env file in your project root and load it into your environment variables:

from repo_to_text.api_key_manager import initialize_api_keys

API_KEYS = {
    'GEMINI_API_KEY': 'abcdefghijklmnopqrstuvwxyz1234567890abc',
}

initialize_api_keys(api_keys_dict=API_KEYS)

Then, you can use the tool as follows from your Python scripts:

from repo_to_text import convert_repos_to_text

# For GitHub repositories
github_repos = [
    "https://github.com/username/repo1",
    "https://github.com/username/repo2"
]
convert_repos_to_text(github_repos, "output_dir", is_github=True)

# For local folders
local_folders = [
    "/path/to/folder1",
    "/path/to/folder2"
]
convert_repos_to_text(local_folders, "output_dir", is_github=False)

License

This project is licensed under the MIT License.

About

convert github repos or local folders into a single text document that can be used for LLM training, RAG or copy pasting into context while developing

Resources

Stars

Watchers

Forks

Languages