Optimize code paths #23

kurtmckee · 2024-08-03T12:49:36Z

This PR significantly increases the speed of encoding by optimizing blocklist checking, which were very expensive. It also adds a simple performance testing script.

The performance improvements are made possible by pre-filtering the blocklist into groups:

Words that must be matched exactly (3 characters long)
Words that must be matched at the start or end of the ID (those containing numbers)
Words that can be matched anywhere in the ID

This pre-filtering allows blocklist checks to eliminate almost all looping in Python.

Here is the output of the performance testing for the main branch before this PR, and for this PR branch:

Before changes:                     After changes:
$ python assets/performance.py      $ python assets/performance.py 
Iterations: 100,000                 Iterations: 100,000
Instantiate:          23.493        Instantiate:           1.447
Encode [0]:            2.245        Encode [0]:            0.211
Encode [0, 1, 2]:     18.278        Encode [0, 1, 2]:      2.798
Decode 'bM':           0.350        Decode 'bM':           0.366
Decode 'rSCtlB':       2.490        Decode 'rSCtlB':       2.621

As you can see, IDs can be encoded ~85% faster. Although it's not reflected in these performance tests, if a non-default alphabet or blocklist is used, instantiation will require more up-front computation to filter the blocklist, but the encoding will still be faster.

Decode times are not affected by these changes.

Converting the alphabet to a list is very costly at scale. Getting the length of the alphabet repeatedly is a little costly. Comparing `result == 0` vs `not result` is measurably costly. These have all been eliminated. Python's timeit module suggest a performance improvement of ~300%.

Previous behavior required checking the entire list even if the first number is invalid.

By filtering the blocklist once during instantiation, a significant amount of computation can be eliminated when the same instance is reused over and over. This additionally updates the hypothesis testing; generated IDs are now confirmed to be blockable.

4kimov · 2024-08-05T20:44:49Z

Hi @kurtmckee,

Thank you for the recent changes and this PR as well 💪
That's impressive optimization for the encoding function.

Questions:

To play the devil's advocate: if an average user does 1 encoding (let's say per API request), then pre-changes (one instantiation + one encoding) 23.493 + 18.278 = 41.7, and post-changes it would be 54.7. If I understood the numbers correctly: encoding once would be slower than right now, but encoding multiple times would be faster. I wonder which use-case there's more of in the real world. Is there any way to speed up instantiation even more to have a win-win?
Ruby version has recently had a performance PR. It's different than yours, but nevertheless, have you seen that one?

kurtmckee · 2024-08-05T20:57:28Z

I think that the average user isn't going to notice this.

Iterations: 1
Instantiate:           0.001
Encode [0]:            0.000
Encode [0, 1, 2]:      0.000
Decode 'bM':           0.000
Decode 'rSCtlB':       0.000

This only pays off at scale, or for large bulk operations, but there's no route to be speedy prior to this change.

If you're open to a transformation in how the blocklist is stored in the project, then it's possible to skip filtering of the default blocklist at instantiation for all installations everywhere, similar to the Ruby PR (at least as I'm reading it). That's an important metric, too, but I'd need the go-ahead to make that change, since there's an administrative burden when updating the global blocklist.

4kimov · 2024-08-05T22:07:50Z

I think that the average user isn't going to notice this

Fair enough: using it once is minor disadvantage, but at scale it's big advantage. I agree.

If you're open to a transformation in how the blocklist is stored in the project

I am - to how it's stored in this library. The spec's blocklist is an unordered list, and individual implementations are free to optimize the order or chunk it as needed. Perhaps it's worth creating a small script here to transform it, so that the next time it changes on the spec level it will be easy to update?

Other than that, I like the optimizations and will be happy to merge. Thank you for adjusting the tests as well!

Edit: Maybe a script is an overkill? Some LLM can probably adjust the list as needed.

kurtmckee · 2024-08-05T23:12:44Z

@4kimov We were thinking along the same lines! I wrote a script to update the constants.py file, and updated the test suite to do a sanity check of constants.py when it runs.

I think this will reduce the administrative burden of maintaining the blocklist and keeping instantiation fast for the default blocklist.

Here's the new performance results, which reflects using the default blocklist during instantiation:

$ python assets/performance.py 
Iterations: 100,000
Instantiate:           1.447
Encode [0]:            0.211
Encode [0, 1, 2]:      2.798
Decode 'bM':           0.366
Decode 'rSCtlB':       2.621

Thanks for pointing out the Ruby PR's take on performance improvements! That was insightful.

4kimov · 2024-08-06T20:44:51Z

@kurtmckee Thank you for all the work and numerous PRs. Very cool optimizations indeed! 💪
I've pushed it out as v0.5.0.

And @Pevtrick thanks for the quick merges when I wasn't around!

kurtmckee · 2024-08-06T20:59:42Z

You're welcome! This has been a lot of fun!

kurtmckee · 2024-08-08T17:58:37Z

@4kimov and @Pevtrick I'm not seeing the 0.5.0 tag on GitHub. Does that need to get pushed to the repo?

4kimov · 2024-08-08T20:36:49Z

That's because I forgot about it :) I've pushed it now.

kurtmckee added 8 commits July 31, 2024 09:15

Fail fast when checking numbers to encode

ef28bef

Previous behavior required checking the entire list even if the first number is invalid.

Use any() to eliminate a block indent

1955cbc

Add a changelog entry

ac069bd

Add a basic performance measurement script

ec13701

Add a test to ensure short blocklist words are ignored

442f316

Don't track coverage in the generated roundtrip test conditional

5452169

Make instantiation fast when using the default alphabet and blocklist

7c4ef18

kurtmckee force-pushed the optimize-code-paths branch from 4419b95 to 7c4ef18 Compare August 5, 2024 23:19

4kimov merged commit 70388b8 into sqids:main Aug 6, 2024
8 checks passed

kurtmckee deleted the optimize-code-paths branch August 6, 2024 20:33

4kimov mentioned this pull request Dec 22, 2024

Performance of Is_blocked_id is suboptimal #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize code paths #23

Optimize code paths #23

Uh oh!

kurtmckee commented Aug 3, 2024 •

edited

Loading

Uh oh!

4kimov commented Aug 5, 2024

Uh oh!

kurtmckee commented Aug 5, 2024 •

edited

Loading

Uh oh!

4kimov commented Aug 5, 2024 •

edited

Loading

Uh oh!

kurtmckee commented Aug 5, 2024

Uh oh!

Uh oh!

4kimov commented Aug 6, 2024

Uh oh!

kurtmckee commented Aug 6, 2024

Uh oh!

kurtmckee commented Aug 8, 2024

Uh oh!

4kimov commented Aug 8, 2024

Uh oh!

Uh oh!

Optimize code paths #23

Optimize code paths #23

Uh oh!

Conversation

kurtmckee commented Aug 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

4kimov commented Aug 5, 2024

Uh oh!

kurtmckee commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

4kimov commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kurtmckee commented Aug 5, 2024

Uh oh!

Uh oh!

4kimov commented Aug 6, 2024

Uh oh!

kurtmckee commented Aug 6, 2024

Uh oh!

kurtmckee commented Aug 8, 2024

Uh oh!

4kimov commented Aug 8, 2024

Uh oh!

Uh oh!

kurtmckee commented Aug 3, 2024 •

edited

Loading

kurtmckee commented Aug 5, 2024 •

edited

Loading

4kimov commented Aug 5, 2024 •

edited

Loading