These scripts use the Legiscan API to get bills from Arizona's legislature into a pure text format. Others interested in doing the same but for other states need only modify the API call in the code.
You can then analyze the text for similarities to other legislation using the Data Science for the Social Good's awsome tool to track legislative plagiarism, the Legislative Influence Detector
Here's the GitHub repo for that: LID
These scripts are compatible with Python 3. If you're using Python 2, install Python 3 and follow Kenneth Reitz's excellent tutorial to set up a virtual environment pointing to your Python 3 install.
Once your Python 3 virtual environment is activated and you're inside the LegiscanApiScripts directory,
run pip install -r requirements.txt
###Setup
-
Clone the repository to your machine:
git clone https://github.com/qstin/LegiScanApiScripts.git
-
cd into the cloned directory.
-
Set up a virtual Python environment if needed.
-
Check everything is working by running
python leg-text-generator.py
This program is what writes the bills to text files in~/path/to/LegiScanApiScripts/bills
. Be prepared for this to run for while. -
You'll need to set up your environmental variables for the LID project's algorithm. To do that, type each of these commands into your terminal:
unset PYTHONPATH export POLICY_DIFFUSION=/path/to/policy_diffusion export PYTHONPATH=${POLICY_DIFFUSION}/lid:${PYTHONPATH} export PYTHONPATH=${POLICY_DIFFUSION}/lid/etl:${PYTHONPATH} export PYTHONPATH=${POLICY_DIFFUSION}/lid/utils:${PYTHONPATH} export PYTHONPATH=${POLICY_DIFFUSION}/lid/evaluation:${PYTHONPATH} export PYTHONPATH=${POLICY_DIFFUSION}/scripts:${PYTHONPATH}
-
Once that's done and you've got your bills directory filled with text files, you can run the LID script.
- This process takes forever, maybe up to six hours if you're running all of your bills through at once.
-
To run one bill at a time, enter this into your command line from the LID directory:
cat sampleFile.txt | xargs -0 python LID/lid_script.py -text
-
To run all bills in the bill directory at once, run this:
parallel "cat {} | xargs -0 python LID/lid_script.py -text > {.}.json" ::: data/bills/*.txt