Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize code paths #23

Merged
merged 9 commits into from
Aug 6, 2024
Merged

Optimize code paths #23

merged 9 commits into from
Aug 6, 2024

Conversation

kurtmckee
Copy link
Contributor

@kurtmckee kurtmckee commented Aug 3, 2024

This PR significantly increases the speed of encoding by optimizing blocklist checking, which were very expensive. It also adds a simple performance testing script.

The performance improvements are made possible by pre-filtering the blocklist into groups:

  • Words that must be matched exactly (3 characters long)
  • Words that must be matched at the start or end of the ID (those containing numbers)
  • Words that can be matched anywhere in the ID

This pre-filtering allows blocklist checks to eliminate almost all looping in Python.

Here is the output of the performance testing for the main branch before this PR, and for this PR branch:

Before changes:                     After changes:
$ python assets/performance.py      $ python assets/performance.py 
Iterations: 100,000                 Iterations: 100,000
Instantiate:          23.493        Instantiate:           1.447
Encode [0]:            2.245        Encode [0]:            0.211
Encode [0, 1, 2]:     18.278        Encode [0, 1, 2]:      2.798
Decode 'bM':           0.350        Decode 'bM':           0.366
Decode 'rSCtlB':       2.490        Decode 'rSCtlB':       2.621

As you can see, IDs can be encoded ~85% faster. Although it's not reflected in these performance tests, if a non-default alphabet or blocklist is used, instantiation will require more up-front computation to filter the blocklist, but the encoding will still be faster.

Decode times are not affected by these changes.

Converting the alphabet to a list is very costly at scale.
Getting the length of the alphabet repeatedly is a little costly.
Comparing `result == 0` vs `not result` is measurably costly.
These have all been eliminated.

Python's timeit module suggest a performance improvement of ~300%.
Previous behavior required checking the entire list
even if the first number is invalid.
By filtering the blocklist once during instantiation,
a significant amount of computation can be eliminated
when the same instance is reused over and over.

This additionally updates the hypothesis testing;
generated IDs are now confirmed to be blockable.
@4kimov
Copy link
Member

4kimov commented Aug 5, 2024

Hi @kurtmckee,

Thank you for the recent changes and this PR as well 💪
That's impressive optimization for the encoding function.

Questions:

  1. To play the devil's advocate: if an average user does 1 encoding (let's say per API request), then pre-changes (one instantiation + one encoding) 23.493 + 18.278 = 41.7, and post-changes it would be 54.7. If I understood the numbers correctly: encoding once would be slower than right now, but encoding multiple times would be faster. I wonder which use-case there's more of in the real world. Is there any way to speed up instantiation even more to have a win-win?
  2. Ruby version has recently had a performance PR. It's different than yours, but nevertheless, have you seen that one?

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Aug 5, 2024

I think that the average user isn't going to notice this.

Iterations: 1
Instantiate:           0.001
Encode [0]:            0.000
Encode [0, 1, 2]:      0.000
Decode 'bM':           0.000
Decode 'rSCtlB':       0.000

This only pays off at scale, or for large bulk operations, but there's no route to be speedy prior to this change.

If you're open to a transformation in how the blocklist is stored in the project, then it's possible to skip filtering of the default blocklist at instantiation for all installations everywhere, similar to the Ruby PR (at least as I'm reading it). That's an important metric, too, but I'd need the go-ahead to make that change, since there's an administrative burden when updating the global blocklist.

@4kimov
Copy link
Member

4kimov commented Aug 5, 2024

I think that the average user isn't going to notice this

Fair enough: using it once is minor disadvantage, but at scale it's big advantage. I agree.

If you're open to a transformation in how the blocklist is stored in the project

I am - to how it's stored in this library. The spec's blocklist is an unordered list, and individual implementations are free to optimize the order or chunk it as needed. Perhaps it's worth creating a small script here to transform it, so that the next time it changes on the spec level it will be easy to update?

Other than that, I like the optimizations and will be happy to merge. Thank you for adjusting the tests as well!

Edit: Maybe a script is an overkill? Some LLM can probably adjust the list as needed.

@kurtmckee
Copy link
Contributor Author

@4kimov We were thinking along the same lines! I wrote a script to update the constants.py file, and updated the test suite to do a sanity check of constants.py when it runs.

I think this will reduce the administrative burden of maintaining the blocklist and keeping instantiation fast for the default blocklist.

Here's the new performance results, which reflects using the default blocklist during instantiation:

$ python assets/performance.py 
Iterations: 100,000
Instantiate:           1.447
Encode [0]:            0.211
Encode [0, 1, 2]:      2.798
Decode 'bM':           0.366
Decode 'rSCtlB':       2.621

Thanks for pointing out the Ruby PR's take on performance improvements! That was insightful.

@kurtmckee kurtmckee force-pushed the optimize-code-paths branch from 4419b95 to 7c4ef18 Compare August 5, 2024 23:19
@4kimov 4kimov merged commit 70388b8 into sqids:main Aug 6, 2024
8 checks passed
@kurtmckee kurtmckee deleted the optimize-code-paths branch August 6, 2024 20:33
@4kimov
Copy link
Member

4kimov commented Aug 6, 2024

@kurtmckee Thank you for all the work and numerous PRs. Very cool optimizations indeed! 💪
I've pushed it out as v0.5.0.


And @Pevtrick thanks for the quick merges when I wasn't around!

@kurtmckee
Copy link
Contributor Author

You're welcome! This has been a lot of fun!

@kurtmckee
Copy link
Contributor Author

@4kimov and @Pevtrick I'm not seeing the 0.5.0 tag on GitHub. Does that need to get pushed to the repo?

@4kimov
Copy link
Member

4kimov commented Aug 8, 2024

That's because I forgot about it :) I've pushed it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants