Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: orjson instead of simplejson to load and save JSON objects #134

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Martin1887
Copy link
Contributor

Hello.

This pull request replaces simplejson by orjson.

The previous JSON files generated by simplejson are compatible with orjson and, indeed, they seem equal to me, so no breaking changes in this aspect.

However, the API or orjson is not compatible with the Python native json library, so orjson is required to use Lab after this change meanwhile simplejson was optional.

In a performance perspective, orjson is around 4x faster (without SWAP usage) but it uses more RAM (around the double of RAM, not a problem in the most use cases). More benchmarks would be needed in your side before merging this pull request however.

Thanks.

@jendrikseipp
Copy link
Collaborator

Great, thanks! I'll test when I find the time. Maybe you can try fixing the tests in the meantime.

Copy link
Collaborator

@jendrikseipp jendrikseipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and sorry for taking so long to review it!

Could you run a small test that measures the runtime of loading a large properties file with the old vs the new code (making sure that the filesystem cache is hot, by ignoring the first measured time)?

lab/tools.py Outdated Show resolved Hide resolved
lab/tools.py Outdated Show resolved Hide resolved
lab/tools.py Outdated Show resolved Hide resolved
lab/tools.py Outdated Show resolved Hide resolved
@jendrikseipp
Copy link
Collaborator

Ah, I see now that you already measured a 4x speedup.

@jendrikseipp
Copy link
Collaborator

Thanks! Code looks good now. I tested it locally: for a 500 MiB properties file, simplejson takes 2s to read it, while orjson takes 1.9s. Do you have the logs for a properties file where the switch to orjson pays off more? How much more memory is used in that case?

@Martin1887
Copy link
Contributor Author

I don't have any logs, but the properties file inside additive/reports.tar.gz of the following link should reproduce this behaviour:

https://zenodo.org/records/13378665/files/experiments_scripts_and_results.zip?download=1

I experienced 4x time speedup with 2x RAM consumption.

These properties files have many lists of numbers, maybe orjson is much better in this type of data.

Also, disabling sorting and formatting the speedup is higher, but properties files are not human-readable. Maybe, a parameter to disable sorting and formatting in fetchers would be a good idea for large experiments where the properties files will not be manually revised anyway.

@jendrikseipp
Copy link
Collaborator

Thanks! I'll look into this after the break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants