Skip to content

canonicalize_url isn't handling some crucial cases #107

Open
@sibiryakov

Description

@sibiryakov
  • removal of userinfo
  • dots and slashes in path and hostname
  • spaces succeeding and preceding the URL
  • common session id variables and their values
  • ip v6 canonicalization

Useful links:
https://developers.google.com/safe-browsing/v4/urls-hashing
https://github.com/iipc/urlcanon/blob/master/python/urlcanon/canon.py#L530

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions