Automates the analysis of GitHub repositories specifically tailored for usage with large context LLMs. This Python script efficiently fetches README files, repository structure, and non-binary file contents. Additionally, it provides structured outputs complete with pre-formatted prompts to guide further analysis of the repository's content.
This project is a fork of Doriandarko/RepoToTextForLLMs.
- Web Interface: Transformed from a CLI tool to a real-time Streamlit web application
- AI Analysis: Integrated OpenAI's GPT-4 to provide intelligent insights about repositories
- README Retrieval: Automatically extracts the content of README.md to provide an initial insight into the repository.
- Structured Repository Traversal: Maps out the repository's structure through an iterative traversal method, ensuring thorough coverage without the limitations of recursion.
- Selective Content Extraction: Retrieves text contents from files, intelligently skipping over binary files to streamline the analysis process.
- AI-Powered Analysis: Utilizes GPT-4 to provide detailed analysis of the repository's structure and content.
To use RepoToTextForLLMs, you'll need:
- Python installed on your system.
- A virtual environment manager (e.g.,
venv
,virtualenv
). - The following Python packages:
streamlit
PyGithub
openai
re
-
Clone the repository:
git clone https://github.com/yourusername/RepoToTextForLLMs.git cd RepoToTextForLLMs
-
Set up a virtual environment:
python -m venv myenv source myenv/bin/activate # On Windows: myenv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
Note: The
requirements.txt
file contains all necessary dependencies updated from the original project. -
Configure environment variables:
-
Create a
.env
file in the project root (if not already present) and add your GitHub and OpenAI tokens:GITHUB_TOKEN = 'YOUR_GITHUB_TOKEN' OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY'
-
-
Ensure your virtual environment is activated:
source myenv/bin/activate # On Windows: myenv\Scripts\activate
-
Run the application:
streamlit run app.py
-
Interact with the app:
- Open your browser and navigate to the URL provided by Streamlit.
- Enter the GitHub repository URL you wish to analyze.
- Click on "Analyze Repository" to initiate the analysis.
- Clone the repository and set up the environment as described in the Installation section.
- Run the Streamlit application.
- Input the GitHub repository URL into the provided field.
- The application will process the repository and display:
- README content
- LICENSE information
- Repository structure
- File contents (excluding binary files)
- AI-generated analysis based on the repository's content
Contributions to RepoToTextForLLMs are welcomed. Whether it's through submitting pull requests, reporting issues, or suggesting improvements, your input helps make this tool better for everyone. Please see the Contributing Guidelines for more details.
This project is licensed under the MIT License. See the LICENSE file for details.