Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Response.text is recomputed every time it's accessed #265

Open
pederhan opened this issue Jun 12, 2024 · 0 comments
Open

Optimization: Response.text is recomputed every time it's accessed #265

pederhan opened this issue Jun 12, 2024 · 0 comments
Assignees

Comments

@pederhan
Copy link
Member

pederhan commented Jun 12, 2024

This issue is related to the rewritten API code in #214.

We access the property Response.text multiple times in get_list_generic() (directly and through validate_*_response()):

response = get(path, params)
# Non-paginated results, return them directly
if "count" not in response.text:
return validate_list_response(response)
resp = validate_paginated_response(response)
if limit and resp.count > abs(limit):
raise TooManyResults(f"Too many hits ({resp.count}), please refine your search criteria.")
# Iterate over all pages and collect the results
ret: list[Json] = resp.results
while resp.next:
response = get(resp.next, params=params, ok404=ok404)
if response is None:
break
resp = validate_paginated_response(response)
ret.extend(resp.results)
if expect_one_result:
if len(ret) == 0:
return {}
if len(ret) != 1:
raise MultipleEntititesFound(f"Expected exactly one result, got {len(ret)}.")
return ret[0]
return ret

def validate_list_response(response: Response) -> list[Json]:
"""Parse and validate that a response contains a JSON array.
:param response: The response to validate.
:raises ValidationError: If the response does not contain a valid JSON array.
:returns: Parsed response data as a list of Python objects.
"""
try:
return ListResponse.validate_json(response.text)
# NOTE: ValueError catches custom Pydantic errors too
except ValueError as e:
raise ValidationError(f"{response.url} did not return a valid JSON array") from e
def validate_paginated_response(response: Response) -> PaginatedResponse:
"""Validate and parse that a response contains paginated JSON data.
:param response: The response to validate.
:raises ValidationError: If the response does not contain valid paginated JSON.
:returns: Parsed response data as a PaginatedResponse object.
"""
try:
return PaginatedResponse.from_response(response)
except ValueError as e:
raise ValidationError(f"{response.url} did not return valid paginated JSON") from e

This is inefficient, because the text is recomputed each time we access the property:

https://github.com/psf/requests/blob/0e322af87745eff34caffe4df68456ebc20d9068/src/requests/models.py#L909-L945

Furthermore, when not specifying encoding, it also has to run chardet.detect() on the text to determine the encoding:

https://github.com/psf/requests/blob/0e322af87745eff34caffe4df68456ebc20d9068/src/requests/models.py#L789-L797

HTTPX does not do this, which is why this came as a surprise to me.

https://github.com/encode/httpx/blob/92e9dfb3997ab07353bbe98f9b5a6e69b7701c70/httpx/_models.py#L575-L584


Possible solution

We can make sure we only access Response.text once, and bind the result to some variable, so that we don't re-compute it each time we want to access it. This will require some changes to the validate_*_response() functions, and will make their signatures messier, as we still want to pass the Response object to the functions in order to produce good exception messages:

-def validate_list_response(response: Response) -> list[Json]:
+def validate_list_response(response: Response, text: str) -> list[Json]:
    """Parse and validate that a response contains a JSON array.

    :param response: The response to validate.
+   :param text: The response content as text.
    :raises ValidationError: If the response does not contain a valid JSON array.
    :returns: Parsed response data as a list of Python objects.
    """
    try:
-       return ListResponse.validate_json(text)
+       return ListResponse.validate_json(text)
    except ValueError as e:
        raise ValidationError(f"{response.url} did not return a valid JSON array") from e
@pederhan pederhan self-assigned this Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant