Estimate how frequently Python packages are imported across public GitHub repositories.
We determine package popularity by:
- Randomly sampling GitHub repositories with Python as the main language
- Analyzing Python import statements in these repositories
- Extrapolating findings based on the total Python repository count (~18M repositories
The system continually improves its accuracy by sampling additional repositories every 6 hours via GitHub Actions.
Note: We have stopped considering standard Python libraries but have not yet removed all the data.
Script | Purpose |
---|---|
find_repos.py | Queries GitHub API for random Python repositories |
analyze_imports.py | Extracts import statements from repository files |
count_libs.py | Aggregates and calculates package usage statistics |
update_readme.py | Refreshes this README with latest data |
total_python_repos.ipynb | Estimates total Python repository count on GitHub |
File | Description | Format |
---|---|---|
repos.jsonl | Details of processed repositories | JSONL |
imports.jsonl | Raw import statements extracted from repos | JSONL |
library_counts.csv | Aggregated package usage statistics | CSV |
Our GitHub Actions workflow orchestrates the entire process:
Find Random Repos → Analyze Imports → Count Package Usage → Update Statistics → Refresh README
Rank | Library | Count |
---|---|---|
1 | numpy | 2923 |
2 | matplotlib | 978 |
3 | torch | 921 |
4 | pandas | 908 |
5 | requests | 685 |
6 | django | 632 |
7 | cv2 | 585 |
8 | sklearn | 508 |
9 | utils | 457 |
10 | scipy | 457 |
Last updated: 2025-04-08 12:54:10 UTC