-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: orjson instead of simplejson to load and save JSON objects #134
base: main
Are you sure you want to change the base?
Conversation
Great, thanks! I'll test when I find the time. Maybe you can try fixing the tests in the meantime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and sorry for taking so long to review it!
Could you run a small test that measures the runtime of loading a large properties file with the old vs the new code (making sure that the filesystem cache is hot, by ignoring the first measured time)?
Ah, I see now that you already measured a 4x speedup. |
Thanks! Code looks good now. I tested it locally: for a 500 MiB properties file, simplejson takes 2s to read it, while orjson takes 1.9s. Do you have the logs for a properties file where the switch to orjson pays off more? How much more memory is used in that case? |
I don't have any logs, but the https://zenodo.org/records/13378665/files/experiments_scripts_and_results.zip?download=1 I experienced 4x time speedup with 2x RAM consumption. These properties files have many lists of numbers, maybe orjson is much better in this type of data. Also, disabling sorting and formatting the speedup is higher, but properties files are not human-readable. Maybe, a parameter to disable sorting and formatting in fetchers would be a good idea for large experiments where the properties files will not be manually revised anyway. |
Thanks! I'll look into this after the break. |
Hello.
This pull request replaces
simplejson
byorjson
.The previous JSON files generated by
simplejson
are compatible withorjson
and, indeed, they seem equal to me, so no breaking changes in this aspect.However, the API or
orjson
is not compatible with the Python nativejson
library, soorjson
is required to use Lab after this change meanwhilesimplejson
was optional.In a performance perspective,
orjson
is around 4x faster (without SWAP usage) but it uses more RAM (around the double of RAM, not a problem in the most use cases). More benchmarks would be needed in your side before merging this pull request however.Thanks.