Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize StandaloneAhcWSRequest.add* methods. #301

Merged
merged 1 commit into from
Feb 14, 2019

Conversation

htmldoug
Copy link
Contributor

@htmldoug htmldoug commented Jan 22, 2019

Pull Request Checklist

  • Have you read through the contributor guidelines?
  • Have you signed the Typesafe CLA?
  • Have you squashed your commits?
  • Have you added copyright headers to new files?
  • Have you checked that both Scala and Java APIs are updated?
  • Have you updated the documentation for both Scala and Java sections?
  • Have you added tests for any changed functionality?

Purpose

Optimizes StandaloneAhcWSRequest.add* methods from O(n) down to O(eC) by avoiding reconstructing maps from scratch.

Background Context

I spotted this while benching #288.

StandaloneAhcWSRequest uses immutable maps that support structural sharing, yet the implementations of addQueryStringParameters((String, String)*) and addHttpHeaders(hdrs: (String, String)*) rebuild the entire map on each call. For a request with n elements, appending a new one is an O(n) operation over an O(eC) data structure. Inserting n elements is O().

Play 2.5 WSRequest has an O(eC) implementation, so this seems like new behavior in the pre-release of play-ahc-ws-standalone v2.

Benchmarks

AddQueryParamsBench measures adding a single query param to a request that already has size query params.

Before

[info] Benchmark                      (size)  Mode  Cnt        Score       Error  Units
[info] AddQueryParamsBench.addParams       1  avgt   10      417.620 ±    17.237  ns/op
[info] AddQueryParamsBench.addParams      10  avgt   10     2257.053 ±    76.440  ns/op
[info] AddQueryParamsBench.addParams     100  avgt   10    18839.600 ±   801.003  ns/op
[info] AddQueryParamsBench.addParams    1000  avgt   10   224491.116 ± 12133.870  ns/op
[info] AddQueryParamsBench.addParams   10000  avgt   10  3065837.189 ± 82797.559  ns/op

We cross the 1s/call threshold somewhere between the 1,000 and 10,000th param. At 10k query params, constructing the whole request takes 16 seconds.

Admittedly 10k query params is on the high end of what we're likely to see in real apps, but we can also bring it down to nanoseconds pretty trivially.

After

[info] Benchmark                      (size)  Mode  Cnt    Score   Error  Units
[info] AddQueryParamsBench.addParams       1  avgt   10   47.978 ± 0.388  ns/op
[info] AddQueryParamsBench.addParams      10  avgt   10   76.495 ± 0.663  ns/op
[info] AddQueryParamsBench.addParams     100  avgt   10  118.205 ± 1.072  ns/op
[info] AddQueryParamsBench.addParams    1000  avgt   10  141.506 ± 1.681  ns/op
[info] AddQueryParamsBench.addParams   10000  avgt   10  151.452 ± 2.560  ns/op

Caveats

There is a default implementation of these methods in the trait StandaloneWSRequest. I don't see any harm in leaving them there.

References

No other issues or PRs that I'm aware of.

Copy link
Member

@marcospereira marcospereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, thanks for adding the jmh tests, @htmldoug.

See the other review comments.

@@ -480,6 +478,24 @@ lazy val `integration-tests` = project.in(file("integration-tests"))
)
.disablePlugins(sbtassembly.AssemblyPlugin)

//---------------------------------------------------------------
// Benchmarks (run manually)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Thanks for setting this up!

@@ -496,13 +512,15 @@ lazy val root = project
.settings(formattingSettings)
.settings(disableDocs)
.settings(disablePublishing)
.settings(crossScalaVersions := Seq(scala212))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this here?

Copy link
Contributor Author

@htmldoug htmldoug Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled it up from disablePublishing. integration-tests overrode it to include all 3 versions (the ThisBuild default), and bench needs to do the same things or has build failures from the dependencies on the other projects. Values of the crossScalaVersions settingKey should be equivalent to what was there before.

git blame points to #244. No idea what's happening there, although I assume Eugene knows what he's doing.

@BenchmarkMode(Array(Mode.AverageTime))
@Fork(jvmArgsAppend = Array("-Xmx350m", "-XX:+HeapDumpOnOutOfMemoryError"), value = 1)
@State(Scope.Benchmark)
class AddQueryParamsBench {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a test for the header changes? It would be good if we have tests for both with* and add* methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course.

Copy link
Contributor Author

@htmldoug htmldoug Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added benchmarks for addHttpHeaders but stopped short of adding them for with* since it doesn't do anything interesting.

Since these are run manually, they're not going to catch perf regressions. For that, you'd need a bigger investment to hook them into the build and have consistent (ideally dedicated) hardware. Personally, I like micro-benchmarks for validating before/after changes to interesting parts.

I'll defer and add them to with* if you really want to, but I'm not convinced there's much to be gained from it. An E2E benchmark for typical request construction might be interesting to further micro-optimize the library, but I'd suggest it's beyond the scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we are not running them in dedicated hardware yet. But we have this for playframework, running nightly. We can later add a nightly script to run the benchmarks here, but I agree with you that we can add them to with* methods then.

Summary
-------
`StandaloneAhcWSRequest` stores query params and headers as immutable maps that support structural sharing, yet the implementation of `addQueryStringParameters((String, String)*)` and `addHttpHeaders(hdrs: (String, String)*)` methods rebuild the entire map from scratch on each call. That's an `O(n)` implementation over an `O(eC)` data structure. Inserting *n* elements is *O(n²)* complexity.

Benchmarks
----------
I put together a benchmark to add *size* query params, then measure adding the next one.

Before
```
[info] Benchmark                      (size)  Mode  Cnt        Score       Error  Units
[info] AddQueryParamsBench.addParams       1  avgt   10      417.620 ±    17.237  ns/op
[info] AddQueryParamsBench.addParams      10  avgt   10     2257.053 ±    76.440  ns/op
[info] AddQueryParamsBench.addParams     100  avgt   10    18839.600 ±   801.003  ns/op
[info] AddQueryParamsBench.addParams    1000  avgt   10   224491.116 ± 12133.870  ns/op
[info] AddQueryParamsBench.addParams   10000  avgt   10  3065837.189 ± 82797.559  ns/op

```

After
```
[info] Benchmark                      (size)  Mode  Cnt    Score   Error  Units
[info] AddQueryParamsBench.addParams       1  avgt   10   47.978 ± 0.388  ns/op
[info] AddQueryParamsBench.addParams      10  avgt   10   76.495 ± 0.663  ns/op
[info] AddQueryParamsBench.addParams     100  avgt   10  118.205 ± 1.072  ns/op
[info] AddQueryParamsBench.addParams    1000  avgt   10  141.506 ± 1.681  ns/op
[info] AddQueryParamsBench.addParams   10000  avgt   10  151.452 ± 2.560  ns/op
```
@htmldoug
Copy link
Contributor Author

@marcospereira @wsargent do you guys need anything else from me for this to be mergeable?

Copy link
Member

@marcospereira marcospereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thank you, @htmldoug. And sorry for the long time to give you a proper response. We were too focused on other tasks.

@marcospereira marcospereira merged commit 71b87d6 into playframework:master Feb 14, 2019
marcospereira pushed a commit that referenced this pull request Feb 14, 2019
Summary
-------
`StandaloneAhcWSRequest` stores query params and headers as immutable maps that support structural sharing, yet the implementation of `addQueryStringParameters((String, String)*)` and `addHttpHeaders(hdrs: (String, String)*)` methods rebuild the entire map from scratch on each call. That's an `O(n)` implementation over an `O(eC)` data structure. Inserting *n* elements is *O(n²)* complexity.

Benchmarks
----------
I put together a benchmark to add *size* query params, then measure adding the next one.

Before
```
[info] Benchmark                      (size)  Mode  Cnt        Score       Error  Units
[info] AddQueryParamsBench.addParams       1  avgt   10      417.620 ±    17.237  ns/op
[info] AddQueryParamsBench.addParams      10  avgt   10     2257.053 ±    76.440  ns/op
[info] AddQueryParamsBench.addParams     100  avgt   10    18839.600 ±   801.003  ns/op
[info] AddQueryParamsBench.addParams    1000  avgt   10   224491.116 ± 12133.870  ns/op
[info] AddQueryParamsBench.addParams   10000  avgt   10  3065837.189 ± 82797.559  ns/op

```

After
```
[info] Benchmark                      (size)  Mode  Cnt    Score   Error  Units
[info] AddQueryParamsBench.addParams       1  avgt   10   47.978 ± 0.388  ns/op
[info] AddQueryParamsBench.addParams      10  avgt   10   76.495 ± 0.663  ns/op
[info] AddQueryParamsBench.addParams     100  avgt   10  118.205 ± 1.072  ns/op
[info] AddQueryParamsBench.addParams    1000  avgt   10  141.506 ± 1.681  ns/op
[info] AddQueryParamsBench.addParams   10000  avgt   10  151.452 ± 2.560  ns/op
```
@marcospereira
Copy link
Member

Backport to 2.0.x: 5bd1434

@htmldoug htmldoug deleted the optimize-maps branch February 15, 2019 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants