Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(load_testing): Enable some automatic reconnection handling #159

Merged
merged 5 commits into from
Jun 26, 2024

Conversation

KaylaBrady
Copy link
Collaborator

Summary

Ticket: Repeat load testing at increased scale

What is this PR for?

This adds a couple of changes to the load testing script to support running it at scale.

The main change is to use a long-lived websocket connection with some baked-in retry logic. With this change, I was able to run a load test with 200 users without hitting the ssl.SSLEOFError exception I was seeing before. I did see some logs of reconnect, indicating that automatic reconnection was preformed as expected.

There is some weirdness around the change to use run_forever with threading that I'll comment on specifically.

@KaylaBrady KaylaBrady requested a review from a team as a code owner June 25, 2024 19:51
@KaylaBrady KaylaBrady requested review from boringcactus and removed request for a team June 25, 2024 19:51
@KaylaBrady KaylaBrady marked this pull request as draft June 25, 2024 19:51
@@ -15,7 +15,7 @@ class MobileAppUser(HttpUser, PhoenixChannelUser):
wait_time = between(1, 5)
socket_path = "/socket"

prob_reset_map_data = 0.3
prob_reset_map_data = 0.02
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decreasing this since reseting map data will be rare. This was causing memory usage to spike.

Copy link

Coverage of commit 870c94f

Summary coverage rate:
  lines......: 75.9% (953 of 1255 lines)
  functions..: 72.7% (436 of 600 functions)
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

Copy link

Coverage of commit f29a4eb

Summary coverage rate:
  lines......: 75.9% (953 of 1255 lines)
  functions..: 72.7% (436 of 600 functions)
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

)
leave_push.send()
return leave_push.get_reply()

def sleep_with_heartbeat(self, seconds):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulled from dotcom

Comment on lines 39 to 43
# run_forever is blocking
# https://github.com/websocket-client/websocket-client/issues/980#issuecomment-2065628852
daemon = threading.Thread(target=self.run_forever)
daemon.daemon = True
daemon.start()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this comment suggesting that threading can be avoided by using rel b/c it is async, but I found run_forever was blocking even when using rel as the dispatcher.

I don't love this threading, and I'm pretty sure it is the reason why keyboard interrupt doesn't work when running locust with a single worker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm going to try getting rid of rel entirely. Seems like it isn't strictly necessary websocket-client/websocket-client#969

@KaylaBrady KaylaBrady marked this pull request as ready for review June 25, 2024 20:31
Copy link
Member

@boringcactus boringcactus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a way to get this all working without rel, that seems like it'd be nice, but if that doesn't pan out, this seems like it's fine.

The previous library was experiencing flaky SSL errors when sending messages and had high CPU usage while running locally, even in distributed mode. Switching this library seems to have resolved those issues - ran tests for 200 users joining & leaving the same sets of stops and encountered client errors only caused by backend failures
Copy link

Coverage of commit 96df63e

Summary coverage rate:
  lines......: 75.9% (953 of 1255 lines)
  functions..: 72.7% (436 of 600 functions)
  branches...: no data found

Files changed coverage rate: n/a

Download coverage report

@KaylaBrady
Copy link
Collaborator Author

@boringcactus could you please re-review? I ended up changing the websocket library to websockets. This seems to have helped the reliability of leaving channels - no more mysterious SSLEof Errors. This also uses less local CPU - I was getting 90% CPU usage warnings with the old version even in distributed mode, but the new version can spawn 200 users not in distributed mode without hitting a CPU warning.

@@ -7,7 +7,7 @@
from typing import Any

import gevent
import websocket
import websockets.sync.client as websockets
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regular async version uses asyncio, which is not supported in locust.

@KaylaBrady KaylaBrady merged commit 6429ae9 into main Jun 26, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants