Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renew topic subscriptions only when they need to be #325

Open
chris13524 opened this issue Jan 23, 2024 · 6 comments
Open

Renew topic subscriptions only when they need to be #325

chris13524 opened this issue Jan 23, 2024 · 6 comments
Labels

Comments

@chris13524
Copy link
Member

chris13524 commented Jan 23, 2024

Currently we renew all topic subscriptions on startup and daily, but this is a lot of relay load all at once and is extremely inefficient. Renew them only when necessary instead.

Renewing a topic involves calling subscribe or batch_subscribe and subsequently publishing a 4050 message to the topic in order to extend the TTL to 30 days.

The basic design involves keeping track of when a topic subscription was last renewed, or when it will expire, and storing this information in the database. The renewal job would scan the database for topics that need to be renewed, renew them, and then update the database entry.

This design does not require HS (horizontal scaling) (currently) or HA (high availability) because topic subscriptions are valid for 30 days, so we only need to renew them periodically, let's say every 28 days. If sequential renewals take 1s each, that allows us up to 2.5 million topics (28*24*60*60) able to be renewed continuously using a single instance. Further counts can be reached by parallelizing the renewals. 50x parallel (as we do currently) would allow up to 120 million topics to be maintained. If more than 120 million topics are required, then we should consider increasing the max TTL of subscriptions on the relay or implementing horizontal scaling here.

With #362 we will have multiple replicas of Notify Server running. However, it's likely that without some type of lock all replicas will attempt to renew the same topics at the same time, resulting in redundant requests. Several options are available to mitigate this:

  • Run the subscription renewal job in a separate ECS task with max_capacity of 1. Since HS and HA are not required, this would ensure that topics are only renewed one time. If HS was later required, this would enable independent scalability. Needing horizontal scaling here seems silly and speaks to larger architectural issues. Splitting this into a separate task/service will be a bit of work (e.g. metrics, config, database code) and will make things more complicated to maintain in the future. However, we may want/need this anyway for control the scaling of the publisher service (Horizontal scaling and high availability #362) and separating this can still share the same binary and boilerplate.
  • Only enable the renewal job on 1 replica at a time, perhaps using a Redis lock to coordinate this.
  • Implement locking on the database rows. This would be similar to the publisher service and would be redundant for non-HS scenarios, but would enable HS if needed in the future.
  • Some type of randomization to avoid replicas attempting to renew the same topics at the same time. Minor request duplicates are not a significant concern to the health of the relay. E.g. check for renewal daily, but +/- 12 hours. Renewing all 600k topics currently takes a couple of hours, and so renewing only the ones required continuously only needs 1 topic to be renewed every 4 seconds. At daily renewal, it is 20k topics to be renewed in a batch which is 5 hours sequentially and at 50x parallelization it is 5 minutes. If the batch takes several hours, it is important to renew less frequently to avoid conflicts. Conflicts can also be avoided by renewing more frequently and taking less time to do so. If renewing hourly it would only require renewing 1k topics at a time.
    • 1 million topics / 28 days / 24 hours / 50 at a time = 30 seconds to renew. This results in only 1/120 overlap

One more thing is that initially all the topics will be renewed at the same time, which would still result in spikes of renewals. It would be desirable to add an initial random spread to the renewal times over the course of 30 days, as well as adding randomness to the renewal date itself to keep the spread random over-time.

@chris13524 chris13524 self-assigned this Jan 23, 2024
@arein arein added the accepted label Jan 23, 2024
@chris13524 chris13524 changed the title Renew topic subscriptions Renew topic subscriptions when they need to Jan 26, 2024
@chris13524 chris13524 changed the title Renew topic subscriptions when they need to Renew topic subscriptions only when they need to be Jan 26, 2024
@chris13524
Copy link
Member Author

Topic subscriptions are technically ephemeral until persistence is implemented in the relay. See discussion.

@heilhead
Copy link
Contributor

publishing a 4050 message to the topic in order to extend the TTL to 30 days

The legacy storage implementation requires at least two messages to extend TTL.

@chris13524
Copy link
Member Author

publishing a 4050 message to the topic in order to extend the TTL to 30 days

The legacy storage implementation requires at least two messages to extend TTL.

Or a message in the 4xxx range, which is what we are doing here

@heilhead
Copy link
Contributor

publishing a 4050 message to the topic in order to extend the TTL to 30 days

The legacy storage implementation requires at least two messages to extend TTL.

Or a message in the 4xxx range, which is what we are doing here

Oh yeah, the code was modified since I implemented it. My bad.

@chris13524
Copy link
Member Author

chris13524 commented Feb 20, 2024

Since we switched to IRN, we no longer need to publish a message to extend the subscription TTL to 30 days. Now the subscription TTL is 30 days by default. We may revert so let's not depend on this behavior for next couple weeks.

This changes the math and potentially the strategy (we may not need to touch the database for now). Subscribing to all topics historically only took a couple of minutes when running in parallel.

@chris13524
Copy link
Member Author

For now, renewing all topic subscriptions with cheap method: #388

In the future we can renew topic subscriptions only when the need to be.

@chris13524 chris13524 removed their assignment Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants