You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The discussion that took place while writing the unit tests for canonicalize highlighted several improvements that can be made. Specifically:
Use urlparse to extract the hostname and the path from the url instead of using a regular expression. This is expected to simplify the code significantly.
Add a docstring that will mention the differences between our implementation and the one suggested in the Safe Browsing v2 docs.
Add a couple of extra test cases:
A test case that checks that escape sequences for the tab, CR and LF characters (i.e., %09, %0d and %0a) are not removed (this is one of the canonicalization rules mentioned in the Safe Browsing v2 docs but it's not covered by the suggested test cases).
A test case that checks that a url is canonicalized correctly when it contains a
username and password.
Remove a few TODO comments that don't seem to be important any more (the ones that suggest using d, _subs_made = re.subn(...) instead of d = re.subn(...)[0]: 1, 2, 3).
I would also suggest renaming the d parameter of canonicalize to domain or url. Avoiding single-letter variable names makes for more readable code.
The text was updated successfully, but these errors were encountered:
The discussion that took place while writing the unit tests for
canonicalize
highlighted several improvements that can be made. Specifically:Use
urlparse
to extract the hostname and the path from the url instead of using a regular expression. This is expected to simplify the code significantly.Add a docstring that will mention the differences between our implementation and the one suggested in the Safe Browsing v2 docs.
Add a couple of extra test cases:
A test case that checks that escape sequences for the tab, CR and LF characters (i.e.,
%09
,%0d
and%0a
) are not removed (this is one of the canonicalization rules mentioned in the Safe Browsing v2 docs but it's not covered by the suggested test cases).A test case that checks that a url is canonicalized correctly when it contains a
username and password.
Remove a few TODO comments that don't seem to be important any more (the ones that suggest using
d, _subs_made = re.subn(...)
instead ofd = re.subn(...)[0]
: 1, 2, 3).I would also suggest renaming the
d
parameter ofcanonicalize
todomain
orurl
. Avoiding single-letter variable names makes for more readable code.The text was updated successfully, but these errors were encountered: