Prevent double-encoding of path components in queries. #252

ccma14 · 2021-01-13T22:39:22Z

What do these changes do?

This change addresses an issue with double-encoding URL components when performing a request against elastic search. For example, this causes issues when retrieving an elastic search document by its id, and the ID contains a character that needs to be encoded:

    async with Elasticsearch("myserver.example") as es:
        result = await es.get(index="myindex", doc_type="_doc", id="1234+5678)

Without this patch a 404 error (document not found) is returned even when a document with the specified id exists in the specified index. -- Note that the equivalent query above works correctly when performing a synchronous request via elasticsearch.Elasticsearch.

The reason for this is:

Looking at elasticsearch.client.ElasticSearch, you'll see that every time before perform_request is invoked on the transport object, elasticsearch.client.utils._make_path is already used to url-encode the path argument (which comes down to using urllib.parse.quote).
Using aioelasticsearch these requests end up at aioelasticsearch.connection.AIOHTTPConnection.perform_request, where the full URL is built by adding the url argument (representing the aforementioned already encoded path component) to self.base_url (which is a yarl.URL) using the / operator.
This causes yarl.URL to url-encode the path component a second time.

This PR avoids the issue by

first constructing a relative yarl.URL instance for the path component to be appended, specifying encoded=True to avoid double url-encoding.
Then using URL.join to build the final URL rather than the / operator.

Are there changes in behavior for the user?

No (bugfix).

Related issue number

None

Checklist

I think the code is well written
Unit tests for the changes exist
Documentation reflects the changes N/A
Add a new news fragment into the CHANGES folder
- name it <issue_id>.<type> (e.g. 588.bugfix)
- if you don't have an issue_id change it to the pr id after creating the PR
- ensure type is one of the following:
  - .feature: Signifying a new feature.
  - .bugfix: Signifying a bug fix.
  - .doc: Signifying a documentation improvement.
  - .removal: Signifying a deprecation or removal of public API.
  - .misc: A ticket has been closed, but it is not of interest to users.
- Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

Note
The checklist item above regarding the CHANGES folder seems out of date, since I can't find such a folder. -- If you want me to add a blurb about this PR to CHANGES.rst, please let me know.

ccma14 added 2 commits January 13, 2021 13:48

Prevent double-encoding of path components in queries.

f57397f

Added unit test.

555b4fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent double-encoding of path components in queries. #252

Prevent double-encoding of path components in queries. #252

ccma14 commented Jan 13, 2021 •

edited

Loading

Prevent double-encoding of path components in queries. #252

Are you sure you want to change the base?

Prevent double-encoding of path components in queries. #252

Conversation

ccma14 commented Jan 13, 2021 • edited Loading

What do these changes do?

Are there changes in behavior for the user?

Related issue number

Checklist

ccma14 commented Jan 13, 2021 •

edited

Loading