Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add ujson as alternative JSON encoder #130

Open
wants to merge 15 commits into
base: dev
Choose a base branch
from

Conversation

tonybaloney
Copy link
Contributor

@tonybaloney tonybaloney commented May 25, 2022

The standard library json module is the slowest of the json encoders.

ujson is 10-20x faster at encoding and decoding, especially for large datasets.

This PR moves the json imports into a shim module, which picks the standard library implementation or ujson depending on whether:

  • The user has installed ujson
  • The user hasn't disabled it via an environment variable

@tonybaloney tonybaloney changed the title [WIP] Add orjson as alternative JSON encoder [WIP] Add ujson as alternative JSON encoder May 25, 2022
@vrdmr
Copy link
Member

vrdmr commented May 25, 2022

Any specific reason to choose ujon over orjson?

@tonybaloney
Copy link
Contributor Author

Any specific reason to choose ujon over orjson?

Supporting StringifyEnum was impossible without using a fork of orjson, which I tried and it was using old bindings for Python.

ujson supports custom type serialisation via a __json__ method in the class, which is going to be more performant. It's also more compatible with json

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #130 (c090330) into dev (284c15d) will decrease coverage by 0.25%.
The diff coverage is 81.81%.

@@            Coverage Diff             @@
##              dev     #130      +/-   ##
==========================================
- Coverage   86.04%   85.79%   -0.26%     
==========================================
  Files          50       51       +1     
  Lines        2903     2922      +19     
  Branches      391      396       +5     
==========================================
+ Hits         2498     2507       +9     
- Misses        329      336       +7     
- Partials       76       79       +3     
Flag Coverage Δ
unittests 85.79% <81.81%> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
azure/functions/_durable_functions.py 68.29% <ø> (ø)
azure/functions/_json.py 72.97% <72.97%> (ø)
azure/functions/_cosmosdb.py 88.88% <100.00%> (ø)
azure/functions/_http.py 91.30% <100.00%> (ø)
azure/functions/_queue.py 84.61% <100.00%> (ø)
azure/functions/_sql.py 100.00% <100.00%> (ø)
azure/functions/cosmosdb.py 74.35% <100.00%> (ø)
azure/functions/decorators/utils.py 100.00% <100.00%> (+2.53%) ⬆️
azure/functions/durable_functions.py 83.33% <100.00%> (ø)
azure/functions/eventgrid.py 90.90% <100.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 284c15d...c090330. Read the comment docs.

Another variation

Move expression

remove braces

Change to underscore
@tonybaloney
Copy link
Contributor Author

benchmark

This is the benchmark between ujson (left) and json (right) for HttpRequest.get_json()

@tonybaloney
Copy link
Contributor Author

I've deployed 2 Azure Functions in Australiaeast with this patch applied and without the patch applied

The sample POST request is:

{
	"id": "0001",
	"type": "donut",
	"name": "Cake",
	"ppu": 0.55,
	"batters":
		{
			"batter":
				[
					{ "id": "1001", "type": "Regular" },
					{ "id": "1002", "type": "Chocolate" },
					{ "id": "1003", "type": "Blueberry" },
					{ "id": "1004", "type": "Devil's Food" }
				]
		},
	"topping":
		[
			{ "id": "5001", "type": "None" },
			{ "id": "5002", "type": "Glazed" },
			{ "id": "5005", "type": "Sugar" },
			{ "id": "5007", "type": "Powdered Sugar" },
			{ "id": "5006", "type": "Chocolate with Sprinkles" },
			{ "id": "5003", "type": "Chocolate" },
			{ "id": "5004", "type": "Maple" }
		]
}

The function source code is:

import azure.functions as func
import json

def main(req: func.HttpRequest) -> func.HttpResponse:
    try:
        req_body = req.get_json()
    except ValueError:
        pass

    return func.HttpResponse(
        json.dumps(req_body),
        status_code=200
    )

The script to test the two deployments:

$ ab -p test_data.json -T application/json -n 1000 -c 10 https://ant-functions-load-testing.azurewebsites.net/api/httptriggertest
$ ab -p test_data.json -T application/json -n 1000 -c 10 https://ant-functions-load-testing-og.azurewebsites.net/api/httptriggertest

The results are:

50 66 75 80 90 95 98 99
JSON 114 119 125 128 150 217 345 2113
UJSON 111 116 118 121 126 131 145 175
Normalised JSON 44 49 55 58 80 147 275 2043
Normalised UJSON 41 46 48 51 56 61 75 105

I've subtracted 70ms as this was the mean connect time, so you can more clearly see the difference between the two branches.

10% faster in the 50th percentile, but importantly 2.3x faster in the 95th percentile.
(ignore the 99th percentile as this will include coldstart times)

screenshot 2022-05-25 at 18 53 12

@YunchuWang YunchuWang requested a review from pdthummar as a code owner October 18, 2022 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants