-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created unit testing for analysis and bigquery2pandas #54
base: master
Are you sure you want to change the base?
Conversation
HOW TO RUN ANALYSIS TESTS The tests tell us the percentage change on average for a sample of columns for five test courses whenever a change is made. Here is drive code to run tests: from edx2bigquery.edx2bigquery.bigquery2pandas import analysis_unit_tests test_course_ids = analysis_unit_tests.fetch_test_course_ids() update_msg = "Whatever the most recent update to the code is - keep it short, this will be added to the table" WILSON'S INTERVAL FOR RANKING CAMEO CHEATING AND COLLABORATION The Wilson's Interval Score provides a single value which ranks master, harvester pairs. The interpretability of the ranking is based on the features used to compute the Wilson's Interval IMPORTANT: The positive and negative scores are generated by first normalizing the features, The Wilson's Interval is used to sort these analysis tables. The top row in the table therefore |
…e_show_ans_before
… version of gcloud. Updated to work with the latest version of glcoud for authentication.
…st 1k tracking logs for cameo analysis.
@@ -57,7 +57,9 @@ def get_creds(verbose=False): | |||
print "service_acct=%s, key_file=%s" % (SERVICE_ACCT, KEY_FILE) | |||
return get_service_acct_creds(SERVICE_ACCT, KEY_FILE) | |||
elif KEY_FILE=='USE_GCLOUD_AUTH': | |||
return get_gcloud_oauth2_creds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overwriting what is done for USE_GCLOUD_AUTH
, could you please make this a different option, e.g. USE_GOOGLE_CREDENTIALS
?
Add ABS for HASH and remove INTEGER for sa_ca_dt_corr_ordered and sa_ca_dt_correlation. 1) HASH in the sql query might return a negative integer number. Add ABS to avoid it. 2) The sa_ca_dt_corr_ordred and sa_ca_dt_correlation should be a real number between -1 and 1, e.g. 0.99993. The INTEGER(sa_ca_dt_corr_ordered) will return 0, 1, -1 only.
add ABS for HASH and remove INTEGER for corr
Unit testing works by comparing previous runs of a given analysis with the current run in a single BigQuery query (by appending the last analysis run and comparing the two and then appending the difference to the final unit test table). The test courses are kept private.
bigquery2pandas is a library for interacting with bigquery using pandas. SQL2df is the most frequently used function and will create a correctly typed, correctly ordered, pandas dataframe from a SQL query. Estimated time to completion and other useful features are supported.