Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Istio and Kuma are not being compared appropriately in the benchmark #2

Open
howardjohn opened this issue Apr 11, 2024 · 15 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@howardjohn
Copy link
Contributor

I was reading through https://dev.to/pragmagic/testing-service-mesh-performance-in-multi-cluster-scenario-istio-vs-kuma-vs-nsm-4agj and was surprised by the results.

Upon deeper inspection, I don't think the test is making an accurate comparison. While the article calls out the NSM and Istio/Kuma are fundamentally different, it is actually testing 3 fundamentally different things:

  • NSM, operating at L3
  • Kuma, operating at L4
  • Istio, operating at L7

Per Kuma docs, you need to set appProtocol: http to enable HTTP. Istio allows selection by port name or appProtocol.

Nginx is configured like so:
ports:

  • port: 80
    name: http

Which ultimately enables HTTP processing for Istio but not Kuma.

I sent an email to the email listed in the article as the point of contact but it bounced, so figured I would post here.

@howardjohn
Copy link
Contributor Author

howardjohn commented Apr 11, 2024

A very quick test making them both do HTTP:

Istio HTTP:

P50     P90     P99
0.23ms  0.32ms  0.51ms

Kuma HTTP:

P50     P90     P99
0.22ms  0.36ms  0.55ms

Istio TCP:

P50     P90     P99
0.08ms  0.11ms  0.19ms

Kuma TCP:

P50     P90     P99
0.08ms  0.11ms  0.20ms

Istio Ambient ztunnel only (TCP)

P50     P90     P99
.08ms  0.12ms  0.19ms

Istio ambient w/ waypoint (HTTP)

P50     P90     P99
0.14ms  0.19ms  0.32ms

I didn't run for long enough/with enough samples for that to be reliable, just showing they are in basically identical to eachother when tested apples to apples.

@denis-tingaikin denis-tingaikin added bug Something isn't working help wanted Extra attention is needed labels Apr 24, 2024
@denis-tingaikin denis-tingaikin self-assigned this Apr 24, 2024
@denis-tingaikin
Copy link
Contributor

Hello @howardjohn , we glad that you're interested in the article results.

Per Kuma docs, you need to set appProtocol: http to enable HTTP. Istio allows selection by port name or appProtocol.
Nginx is configured like so:
ports:
port: 80
name: http
Which ultimately enables HTTP processing for Istio but not Kuma.
I sent an email to the email listed in the article as the point of contact but it bounced, so figured I would post here.

As I can see, meshes worked correctly through testing. Have you looked into the results? https://github.com/pragmagic/service-mesh-performance-testing/blob/main/scripts/re[…]proper/test-2023-11-14-T17-12-imp-q6000-c1-d60s-05-01-8080.json and Kuma https://github.com/pragmagic/service-mesh-performance-testing/blob/main/scripts/re[…]proper/test-2023-11-14-T18-10-kmz-q6000-c1-d60s-02-01-8080.json

Note: Client and Requested Workload (NGINX Server) are located on the different clusters, and we wouldn't get these results with status code 200 if meshes did not work.

We've used the default configuration for multi-cluster for Kuma and for Istio (look at the Istio section and Kuma section).

If you know how to correct or improve our setup for Kuma and Istio for the multi-cluster setup, could you please help and share how we should change our configuration? We're really interested in having objective results.

A very quick test making them both do HTTP:

Istio HTTP:

P50     P90     P99
0.23ms  0.32ms  0.51ms

Kuma HTTP:

P50     P90     P99
0.22ms  0.36ms  0.55ms

Istio TCP:

P50     P90     P99
0.08ms  0.11ms  0.19ms

Kuma TCP:

P50     P90     P99
0.08ms  0.11ms  0.20ms

Istio Ambient ztunnel only (TCP)

P50     P90     P99
.08ms  0.12ms  0.19ms

Istio ambient w/ waypoint (HTTP)

P50     P90     P99
0.14ms  0.19ms  0.32ms

I didn't run for long enough/with enough samples for that to be reliable, just showing they are in basically identical to eachother when tested apples to apples.

Just curious, have you tested this with the single cluster or multi-cluster configuration?

@denis-tingaikin denis-tingaikin removed their assignment Apr 24, 2024
@howardjohn
Copy link
Contributor Author

@denis-tingaikin here you set Istio to use http:

. And here kuma is configured to use TCP: .

This could be fixed by adding appProtocol: http to the kuma service.

As I can see, meshes worked correctly through testing

Yes, they both work, but are doing totally different things. Which is somewhat fine -- the article points out that NSM and Kuma/Istio are fundamentally different, so there is some caveat to the numbers. A similar caveat isn't mentioned for Kuma/Istio, nor does it make sense. There is no use case to run the same workload with TCP for Kuma but HTTP for Istio; both service meshes can operate in either mode, and a user would make that choice the same regardless of their mesh implementation.

Have you looked into the results

Yes, I have, and they confirm my issue above. You can see kuma uses 55 sockets

, while Istio uses 1. Why? nginx closes the connection every 1k requests by default. In Istio's case, because there is an HTTP proxy (envoy) in front, fortio is unaware since Envoy abstracts this behind its own connection pool; for Kuma, fortio is directly exposed since its a TCP proxy.

Just curious, have you tested this with the single cluster or multi-cluster configuration?

This was just a simple single cluster one. It was not intended to be a robust set of numbers, just to show that "Istio is slower than Kuma" is inaccurate and rather the claim should be "HTTP processing is slower than TCP processing".

@denis-tingaikin
Copy link
Contributor

@howardjohn Got it. It seems like you're correct, and patch #3 is merged.

Our next step is to retest it. Now @VitalyGushin is going to test it internally, and we'll share the results as soon as we get them.

Feel free to share any other thoughts related to testing or setup improvements; we will consider them.

Thanks!

@craigbox
Copy link

Hi team,

Any update on the results, and if you are able to update the original post?

@VitalyGushin
Copy link

Hello, @craigbox ! We have testing results on Kind clusters: the difference between Istio and Kuma by all metrics is about 10-20 percent (no x-times now). We will run the tests on AWS and Equinix Metal, and after that will update the article.

QPS,Istio,Kuma
0,615.0995527825646,783.5689372260937
1,628.5416222315714,746.2851796469982
2,687.5950500834539,756.2055605654774
3,695.5875152938543,695.0558052728672
4,650.4059679004185,680.786856184325
QPS,Average,Percent to first
Istio,655.4459416583725,100
Kuma,732.3804677791525,111.73773781040197

Average,Istio,Kuma
0,1.5851544880916997,1.2385340386259718
1,1.5501617636093752,1.3001972174058698
2,1.4157923990934709,1.2835611538139404
3,1.399138109857189,1.3975902721801317
4,1.497236785112112,1.4269016125146927
Average,Average,Percent to first
Istio,1.4894967091527693,100
Kuma,1.329356858908121,89.24872748891562

Average Std. Dev.,Istio,Kuma
0,1.6382011474525926,0.41847050484340714
1,0.6693879317136251,0.9238631371509687
2,0.5832977343906867,0.7400979699427236
3,0.9543786440759486,0.9788303162016505
4,0.9960190455333586,0.699716543252542
Average Std. Dev.,Average,Percent to first
Istio,0.9682569006332423,100
Kuma,0.7521956942782584,77.68554954643965

P90,Istio,Kuma
0,2.029668383773342,1.4976087669435942
1,1.9917687169496243,1.6058743581999397
2,1.7246090534979424,1.6070336159903087
3,1.7283216783216784,1.7714655515730786
4,1.9410270024271845,1.8746431254695717
P90,Average,Percent to first
Istio,1.883078966993954,100
Kuma,1.6713250836352986,88.75491219060835

P99,Istio,Kuma
0,3.576653846153841,2.2760844748858444
1,3.05296954314721,2.466167315175097
2,2.748058608058609,2.433535856573705
3,2.6633442622950794,2.6336263736263725
4,2.9762254901960783,2.9462721893491133
P99,Average,Percent to first
Istio,3.0034503499701635,100
Kuma,2.551137241922026,84.9402169057787

P99.9,Istio,Kuma
0,20.96656250000064,3.859700000000107
1,6.142937500000207,8.305500000000022
2,5.749760000000102,6.901928571428807
3,7.132000000000059,9.648000000000641
4,10.40625000000159,6.296571428571656
P99.9,Average,Percent to first
Istio,10.07950200000052,100
Kuma,7.002340000000245,69.47109093286437

@VitalyGushin
Copy link

@howardjohn The purpose of our research is to find out what is the best performance for the chosen service meshes,
so it would be more correct to switch Istio to L4, rather than Kuma to L7. As far as I understand, these changes will be enough?

@howardjohn
Copy link
Contributor Author

@VitalyGushin I think either are valid as long as they are clearly stated which is which, and comparing apples to apples (or clearly stating when not). Comparing NSM L3 to Istio/Kuma L4 is probably closer to apples-to-apples than L3 to L7, so for that purposes it may make sense.

As a general service mesh comparison, the vast majority are running these products in L7 mode, so comparisons outside of that may be a bit confusing to audiences unless clearly stated in the post.

FWIW, if you did want a more direct comparison with NSM, comparing Istio ambient mode (https://istio.io/latest/docs/ambient/getting-started/) would probably be more comparable; a user looking to adopt something like NSM would likely favor ambient vs Istio sidecars. Note this is still L3 wireguard vs L4 mTLS, though.

@craigbox
Copy link

The best performance is going to have the lowest feature set, which somewhat negates the purpose of running them. If your goal is "show the lowest cost method to get to encryption", that's one thing, but as John points out you need to reflect the other realities, including that you can't get to FIPS compliance with Wireguard.

@VitalyGushin
Copy link

@howardjohn @craigbox
We have testing results for AWS and Equinix Metal clusters.
If you have no objections, we are ready to publish them in the article:

AWS QPS Average latency P90 P99 P99.9
Istio L7 621.77 1.60 1.80 4.89 9.54
Kuma L7 697.18 1.43 1.60 4.39 9.36
Equinix Metal QPS Average latency P90 P99 P99.9
Istio L7 2209.36 0.45 0.48 0.51 0.79
Kuma L7 2648.79 0.37 0.39 0.42 0.46

Taking into account the comments above, we decided to leave the L7 results for a while.
The difference by most metrics is around 10 percent for AWS and about 20 percent for Equinix Metal in favor of Kuma.

@VitalyGushin
Copy link

@howardjohn Regarding testing the Istio's "ambient mode", it’s better to send a request by email [email protected] with more details

@craigbox
Copy link

I've been trying to email you there, and both info@ and nsm@ bounce.

nsm@:

We're writing to let you know that the group you tried to contact (nsm) may not exist, or you may not have permission to post messages to the group. A few more details on why you weren't able to post:

(This might be permissions not set to allow posts from the internet)

info@:

550 5.1.1 The email account that you tried to reach does not exist. Please try double-checking the recipient's email address for typos or unnecessary spaces. For more information, go to https://support.google.com/mail/?p=NoSuchUser 4fb4d7f45d1cf-5733bebe37bsor6565145a12.4 - gsmtp

@VitalyGushin
Copy link

@denis-tingaikin Please fix the @pragmagic.io email permissions

@craigbox
Copy link

I also tried to email @denis-tingaikin at his outlook.com address :)

@denis-tingaikin
Copy link
Contributor

denis-tingaikin commented May 22, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants