Skip to content

add_or_replace_parameter lacks control of what is encoded #106

Open
@rennerocha

Description

@rennerocha

add_or_replace_parameter uses urlencode function to convert a string to percent-encoded ASCII string:
https://github.com/scrapy/w3lib/blob/master/w3lib/url.py#L229

If the value of the parameter is already percent-encoded (in my scenario, this value came from a JSON response of an API), it is not possible to specify safe characters that should not be encoded. This argument is available in urlencode signature:
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode

My suggestion is to include a safe argument in add_or_replace_parameter to improve the control over the encoding of your values:
Actual: w3lib.url.add_or_replace_parameter (url, name, new_value)
New: w3lib.url.add_or_replace_parameter(url, name, new_value, safe='')

In [1]: from urllib.parse import urlencode
   ...: from w3lib.url import add_or_replace_parameter
   ...: # This came from an API response already encoded
   ...: full_hierarchy = 'Appliances_Air+Purifiers+%26+Dehumidifiers_Air+Purifiers'
   ...: url = 'http://example.com'

In [2]: urlencode({'hierarchy': full_hierarchy})
Out[2]: 'hierarchy=Appliances_Air%2BPurifiers%2B%2526%2BDehumidifiers_Air%2BPurifiers'

In [3]: urlencode({'hierarchy': full_hierarchy}, safe='+%')
Out[3]: 'hierarchy=Appliances_Air+Purifiers+%26+Dehumidifiers_Air+Purifiers'

In [4]: add_or_replace_parameter(url, 'hierarchy', full_hierarchy)
Out[4]: 'http://example.com?hierarchy=Appliances_Air%2BPurifiers%2B%2526%2BDehumidifiers_Air%2BPurifiers'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions