-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Main to Py3 #171
Merged
Update Main to Py3 #171
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implement unit tests to validate Safe Browsing v2 blocklist generation. Include a test where the versioned test domain is not included in the blocklist and another test where both test domains are omitted, in order to ensure full branch coverage. Closes #137
Implement a test to check that write_safebrowsing_blocklist works as expected even when no output file is provided. Add a comment to explain when this happens.
Implement unit tests for process_entity_whitelist and process_plugin_blocklist. Include tests that check that these functions work as expected even when no log file is provided. Additionally, modify one of the canonicalize test cases so that it contains a valid utf-8 byte sequence.
Implement unit tests for get_entity_lists. Include a testcase where the version is smaller than the version in which large entities separation started, as well as testcases where large entities are separated.
Move the plugin list creation logic from main to a separate get_plugin_lists function, in order to be consistent with get_tracker_lists and get_entity_lists and to facilitate testing.
Raise a ValueError if the 'blocklist' key of a plugin configuration section is empty. Previously, this would lead to the creation of an empty plugin blocklist.
Add unit tests for blocklist and entity list generation
Add a unit test that validates the implementation of get_domains_from_category_filters and another one that confirms that it raises a ValueError when receiving invalid input. Also, add a sample blocklist JSON file to be used in the tests.
Add domains with tags to sample_blocklist.json in order to use them in tests. Implement a simple test just with category filters, a test with category exclusion filters and a test with the "performance" tag.
Implement unit tests for get_tracker_lists. Include testcases to cover the following configuration section scenarios: * It does not specify categories and the default are used * It specifies categories * It specifies excluded categories * It specifies tags * It contains an invalid tag and an error is raised * It contains a version key Finally, add one more domain to the Cryptomining category in tests/sample_blocklist.json to facilitate tag testing.
Add unit tests for domain filtering and get_tracker_lists
Implement a test that validates loading the JSON entity list from a URL. Add another test to cover the case where opening the URL fails with an exception. Additionally, make load_json_from_url include the type and message of this exception in the error message it prints to stderr.
Add more unit tests
…emoteSettings Handle requests.exceptions.ReadTimeout
According to changes on mozilla-services/shavar-prod-lists#284 we are now using "FingerprintInvasive" instead of "Fingerprint" category
Rename the Fingerprinting category to FingerprintingInvasive in the tests and in sample_blocklist.json. Also, update an example in the docstring of get_domains_from_filters with the new category name.
Return the checksum of the contents of the list file starting right after the chunk header. Previously, we were always starting from the 26th byte, regardless of the size of the header.
Move the code that uses the configuration file to determine if a list should be uploaded to S3 outside of `new_data_to_publish_to_s3` and into `publish_to_cloud`. Rename `check_upload_remote_settings_config` to `check_upload_config` and adjust it to work both for S3 and remote settings. In `publish_to_cloud` make sure that `new_data_to_publish_to_s3` is only called when `upload_to_s3` is True, to avoid the cost of unnecessarily accessing S3. In addition, when the key corresponding to a list does not yet exist on S3, return True right away instead of creating a key with dummy list contents and comparing against its checksum. Also, fix a minor bug in getting the name of the S3 key from the configuration file. Previously, if the section did not have the `s3_key` option, `new_data_to_publish_to_s3` would raise a `NoOptionError` instead of falling back to using the `output` option as the S3 key name. Raise a ValueError when the `s3_key` option exists but is empty. Finally, add a docstring to describe the functionality of `new_data_to_publish_to_s3`.
Add moto as a requirement and implement unit tests for new_data_to_publish_to_s3.
Add unit tests to check that the expected S3 key permissions are set in new_data_to_publish_to_s3 and publish_to_s3. These tests are currently expected to fail because setting key permissions with add_user_grant() does not work in moto.
…ategory-on-sample-config Update sample configuration with new fingerprinting category name
Sort the canonicalized domains alphanumerically before writing their hashes to the safebrowsing list file. This ensures that any changes in the order of domains in the input JSON lists will not cause unnecessary list updates. Additionally, this is a prerequisite for porting the list creation script to Python 3. Hash randomization is enabled by default starting from Python 3.3 and this would result in a differently ordered list being created every time the script is run. Also, replace the previous_domains set with a previous_domain string. Now that the domains are ordered, if there are duplicates they will be consecutive. Finally, adjust the tests to the changes and make sure that write_safebrowsing_blocklist tests check both the order of hashes and the existence of duplicates. Note: the dummytracker domains are exempt from ordering and remain at the top of the list file.
Sort the canonicalized domains alphanumerically before writing their hashes to the safebrowsing list file. Replace the previous domains set with a previous_domain string. Finally, adjust the tests to the changes and make sure that process_plugin_blocklist tests check both the order of hashes and the existence of duplicates.
Sort the canonicalized URLs of each entity alphanumerically before writing their hashes to the safebrowsing list file. URLs are still sorted based on the entity name and this commit introduces a secondary ordering within each entity. Use a set to store canonicalized URLs before sorting them, so that duplicates are eliminated. Finally, make sure that process_entitylist tests check both the order of hashes and the existence of duplicates.
Add unit tests for publish2cloud.py
Order domains alphanumerically in safebrowsing lists
Make sure to encode string objects to bytes and decode byte objects to strings wherever necessary, as a preparatory step for transitioning to Python 3. More specifically: * Encode URLs before hashing them, as hashlib.sha256() expects a bytes object in Python 3 * Convert chunk headers to bytes before writing them to a binary file * Decode data right after receiving it from urlopen * Decode the content of HTTP responses before printing it * Decode data right after reading it from a binary file * Encode the data used to mock urlopen in tests Also, fix a minor bug by using the 'b' prefix at the start of each line of a multi-line bytes literal instead of only using it for the first line. Finally, omit the "utf-8" argument when encoding, as UTF-8 is the default encoding.
Apply the following changes to make the list creation script compatible with Python 3. After these changes it will no longer support Python 2. * Import configparser instead of ConfigParser, as this module was renamed in Python 3 * Import quote and unquote from urllib.parse and urlopen from urllib.request instead of importing urllib2 * Remove mock from the dependencies as it is part of the standard library in Python 3 * Pin dependencies to the latest version * Use %d instead of %s when formatting a binary chunk header to avoid getting a TypeError. Also, use %d instead of %u, which is deprecated * Set the "version" key equal to the empty string instead of None when reverting configuration to avoid getting a "TypeError: option values must be strings". * Update CircleCI configuration to use a docker image with Python 3.8 * Update Python version requirement in README * Update README instructions to use virtualenv with Python 3 Closes #108.
Use config.read_file() to read configuration from a file instead of config.readfp(), which is deprecated. Also, use ConfigParser instead of SafeConfigParser, which is an alias that will be removed in future Python versions.
Port script to Python 3
Separate test requirements
Update the instructions in the README to use the new test requirements file. Also, update the dependency caching key in the configuration of CircleCI, so that it includes the checksum of the test requirements file as well.
Update README and CircleCI config to use requirements-test.txt
groovecoder
approved these changes
Nov 6, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code & logic LGTM.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
About this PR
Related to #108, this is the most chunk of work that was needed to update list creation script as Python 3 compatible. Some important changes were: