Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent double-encoding of path components in queries. #252

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ccma14
Copy link

@ccma14 ccma14 commented Jan 13, 2021

What do these changes do?

This change addresses an issue with double-encoding URL components when performing a request against elastic search. For example, this causes issues when retrieving an elastic search document by its id, and the ID contains a character that needs to be encoded:

    async with Elasticsearch("myserver.example") as es:
        result = await es.get(index="myindex", doc_type="_doc", id="1234+5678)

Without this patch a 404 error (document not found) is returned even when a document with the specified id exists in the specified index. -- Note that the equivalent query above works correctly when performing a synchronous request via elasticsearch.Elasticsearch.

The reason for this is:

  • Looking at elasticsearch.client.ElasticSearch, you'll see that every time before perform_request is invoked on the transport object, elasticsearch.client.utils._make_path is already used to url-encode the path argument (which comes down to using urllib.parse.quote).
  • Using aioelasticsearch these requests end up at aioelasticsearch.connection.AIOHTTPConnection.perform_request, where the full URL is built by adding the url argument (representing the aforementioned already encoded path component) to self.base_url (which is a yarl.URL) using the / operator.
  • This causes yarl.URL to url-encode the path component a second time.

This PR avoids the issue by

  • first constructing a relative yarl.URL instance for the path component to be appended, specifying encoded=True to avoid double url-encoding.
  • Then using URL.join to build the final URL rather than the / operator.

Are there changes in behavior for the user?

No (bugfix).

Related issue number

None

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes N/A
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> (e.g. 588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the PR
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: Fix issue with non-ascii contents in doctest text files.

Note
The checklist item above regarding the CHANGES folder seems out of date, since I can't find such a folder. -- If you want me to add a blurb about this PR to CHANGES.rst, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant