-
Notifications
You must be signed in to change notification settings - Fork 0
viggyfresh/DiscoShit
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
File: README This file. File: demographic.py This file contains helper methods to obtain and manage demographic data from the demographic files (ACS_1YR_...csv). average_data(file, dem): Averages the county data for a given demographic. Useful for situations when we really want to use a feature but there isn't sufficient data for the census to output an actual prediction - in these cases, we average the data for that demographic across all counties and use it instead. construct_submatrix(year, demographics, county, prop, type, polarity): Builds a county-specific design matrix and test matrix for a given year, a given proposition and polarity, and given demographic data. Called in a loop by compose_design_matrix over all counties for an issue. Returns design and target matrices. File: features.txt This file contains all feature identifiers as ordained by the US Census Bureau. File: Proposition Labels.csv This file puts each proposition into an issue bucket (i.e. "politics"). Used by the classifier to determine issue tag groupings. File: read2014.py 2014 election results were unavailable from the CA Secretary of State website. We had to use a gnarly JSON file from the Contra Costa Times website; this file basically converts that JSON into the standard election data format we used. File: voting.py This file contains helper methods to obtain and manage voting data from the election files ({YEAR}BallotMod.csv). get_voting_data(): This method reads in election data for each year we have data for, and returns a mapping that allows one to look up voting results based on year, proposition, polarity, county, and vote type (absolute or percent). File: classifier.py This file contains our classifier and helper functions for it. multithreaded_feature_finder(): Creates a thread to find the best features for each tag. Runs them in parallel. tag_feature_selector(): Selects features for each tag by finding the best features and the testing error find_best_features(tag, baseline_testing_error, features): Chooses 10 or fewer features which are better than the baseline of a given tag. get_testing_error(tag, features, numIters): Averages the testingError over numIters runs of test_features. test_features(issue_tag, features): Creates a training set and test set (from propositions contained in issue_tag), runs the classifier with given features and returns the train and test errors. test_classifier_model(model, design, target, test_design, test_target, train): Generates a percent error by comparing the model's predictions to the results of the test set. Called by test_features to test accuracy. build_classifier_model(issues, tag): Generates a model using SVC with an RBF kernel (with features specified by tag) which trains on the training set provided by test_classifier_model. get_all_training_issues(): Hashes the issues into buckets according to our labels, as specified by the Proposition Labels csv file. is_number(s): Confirms that s is a number. Useful for identifying features for which no data is available (the letter "N" is used as a placeholder) convert_to_binary_target(targetMatrix): Converts the target matrix into a binary matrix using basic zero-one thresholding. This matrix can be used to train and test an SVC model. zero_or_one(number): If the result is greater than 50, returns 1, else returns 0. Used by convert_to_binary_target to make a binary target matrix. combine_design_matrices(issues, tag): Compiles the design matrices into a single design matrix. Compiles the target matrices into a single target matrix. Calls compose_design_matrix on each issue in the given issue tag. compose_design_matrix(issue, tag): Generates a design matrix for a single issue and tag (the tag specifies what features should be put into the design matrix) Commands to Run: A good place to start is to pick a set of features and test them out on an issue tag: features = [some_features_here] print test_features("politics", features) To inspect issues you can do the following: issues_hash = get_all_training_issues() print issues_hash["crime"] To see what the design and target matrices look like for an issue tag, you can do the following: issues = issues_hash["environment"] features = [some_features_here] tag = { "name": "Irrelevant", "type": "Percent", "demographics": features } print combine_design_matrices(issues, tag) You can also run multithreaded_feature_finder() (very very very slow) to try to find optimal features for each issue tag: multithreaded_feature_finder() Or, for more granular control over which issue tag you're working with: all_features = [all available features] #HC02_EST_VC02 is 100% for all counties - "uniform" demographic baseline_error = get_testing_error("education", ["HC02_EST_VC02"], 10) tag_feature_selector("education", baseline_error, all_features) If you want to look specifically at voting data: print voting.get_voting_data()["2006"]["Santa Clara_Percent"]["2006_1B_Yes"] Or demographic data: print demographic.construct_submatrix(2008, ["HC02_EST_VC03"], "San Francisco", "someProp", "Percent", "Yes")
About
Election predictor for California
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published