GitHub - noderabbit-team/repohound: Utilities to select Django repos from github.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
src		src
.gitignore		.gitignore
README		README
requirements.txt		requirements.txt

Repository files navigation

create a virtualenv, activate it, and install requirements in it:

pip install -r requirements.txt

overview
========

data directory
==============
has some canned data snapshots, so far

search.py 
=========

does the search, pulling fetching sets of 4 results, and 
working around google's throttling.  
Output goes into data.txt

TODO: Replace with a real screen scraper, since the API limits to
64 results

    - BeautifulSoup
    - lxml
    - Scrapy


select.py 
=========

parses the repo names out of data.txt and fetches the data on each
repo from github.
Result goes into a dict, which is pickled into pickled.pic

TODO: put the results in a hierarchy, respecting forks



upload.py
=========

TODO: start this, based on some google spreadsheet code I have


show.py
=======

Simply shows what's in the pickle file produced by select.py


gather.py
=========

gathers items into a tree and does some rudimentary scoring.
Takes input from the pickle file, at the moment.

TODO
====
get the 8.5k django search data.
put results to google docs

Resources
=========

Readthedocs.org API - http://read-the-docs.readthedocs.org/en/latest/index.html#developer-documentation
PyPi API - ask Richard Jones if we want a dump of the entire PyPi database. http://wiki.python.org/moin/CheeseShopDev
Github API - http://developer.github.com/