Code Archaeologist is an advanced Git repository analysis tool that combines the power of AI (GPT-4) with data visualization to provide deep insights into your codebase's evolution, team dynamics, and development patterns.
- Real-time analysis of Git repositories (local or remote)
- Option to analyze all commits or limit to K most recent commits
- Automatic code complexity calculation
- Technical debt tracking
- Breaking changes detection
- Security impact assessment
-
Overview Tab
- Key repository metrics
- Commit activity heatmap
- Development velocity trends
- Active contributor statistics
-
Code Insights Tab
- Technical debt metrics
- Code complexity evolution
- File impact analysis
- Breaking changes tracking
-
Team Analysis Tab
- Collaboration network visualization
- Knowledge distribution maps
- Developer impact levels
- File ownership patterns
-
Interactive Q&A Tab
- Natural language queries about your repository
- AI-powered analysis with visualizations
- Example questions provided for guidance
-
Custom Analysis Tab
- Customizable visualizations
- Flexible data filtering
- Multiple chart types
- Export capabilities
- Clone the repository:
git clone https://github.com/yourusername/code-archaeologist.git
cd code-archaeologist
- Install required packages:
pip install -r requirements.txt
- Set up OpenAI API key:
export OPENAI_API_KEY='your-api-key'
- Python 3.8+
- OpenAI API key (for commit analysis)
- Required Python packages:
- streamlit
- pydriller
- openai
- plotly
- networkx
- pandas
- numpy
- matplotlib
- wordcloud
- Start the Streamlit app:
streamlit run code-archaeologist.py
-
Enter your repository path (local or remote Git URL)
-
Optional: Set commit limit for analysis
-
Explore the different tabs:
- Navigate through Overview, Code Insights, Team Analysis
- Ask questions in the Interactive Q&A tab
- Create custom visualizations in the Custom Analysis tab
The tool uses GPT-4 to analyze:
- Architectural patterns and decisions
- Technical debt introduction/resolution
- Breaking changes and API modifications
- Security implications
- Code quality trends
- Team dynamics
- Development velocity
- Knowledge distribution
- Timeline charts
- Bar charts
- Pie charts
- Scatter plots
- Network graphs
- Heatmaps
- Sunburst diagrams
- Treemaps
- Radar charts
- Sankey diagrams
- Word clouds
- Bubble charts
- Violin plots
- Funnel charts
- Results are cached for performance
- Clear cache button available for fresh analysis
- Automatic cache invalidation on parameter changes
- "Show me the complexity trend for critical files"
- "What's the knowledge distribution across the team?"
- "Analyze the correlation between technical debt and breaking changes"
- "Which files have the most frequent security-related commits?"
- "Generate a development velocity report for Q1"
- Select visualization type
- Choose data columns for axes
- Apply color coding
- Set data filters
- Configure aggregations
- Export results
- Date range selection
- Author filtering
- Impact level filtering
- File type filtering
- Use commit limits for large repositories
- Clear cache when needed
- Consider rate limits of OpenAI API
- Be mindful of memory usage with large datasets
-
If analysis fails:
- Check repository path
- Verify OpenAI API key
- Clear cache and retry
- Check error messages in console
-
If visualizations don't load:
- Verify data availability
- Check selected columns match data types
- Reduce data size if browser becomes slow
Feel free to submit issues, fork the repository, and create pull requests for any improvements.
MIT License - feel free to use this tool for any purpose, commercial or non-commercial.
- Built with Streamlit
- Powered by OpenAI's GPT-4
- Uses PyDriller for Git analysis
- Visualization by Plotly
For questions and support, please open an issue in the GitHub repository.