-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: republishing of user pkarr records #78
Comments
I agree that we need to keep the load on the DHT low but let me do some math to show you the "load". What Is Load?What Is The Actual Rate LimitEach node in Mainline implements their own rate limiting. I looked at code of Additionally, Mainline has around 10M nodes. In theory, we can read/write 8kb/s to every node. So yes there is a rate limit and yes we should keep the load low but the actual limit is quite generous if the load is distributed evenly. Let's Do Some MathMainline has ~10M nodes. 1 public keys is stored on 20 nodes. If the pubkeys are equally distributed, we can publish 10M/20 = 500k public keys at least every second with one IP address without running into any rate limiting and without overloading even one node. This is an idealized scenario obviously but it still shows what is possible. Assuming we republish user packets every day once, a single IP address could publish a lot of packets.
In this idealized scenario, we can support 45 billion users with a single IP address. That's a lot. Part 1 of 2 |
Proposal ResponseClient-Side RepublishingI don't like the client-side republishing because of two reasons:
Homeserver-Driven RepublishingYou are right that this includes some complexity. I propose the following:
With this method, we only need to store the Risk: In case the packet gets somehow lost on the DHT, the homeserver might not be able to republish. We should try to take this risk initially to keep things simple. Additionally, the risk is mitigated by the fact that relays have a huge cache and the chance that a packet gets missing is very small. Part 2 of 2 |
Background: DHT Eviction and Need for Republishing
We have observe that the DHT evicts pkarr records after a period of inactivity, causing homeservers to become unresolvable unless their records are periodically re-published. While the exact eviction timing is unclear, some anecdotal evidence suggest that eviction typically occurs around 4 days. This 4-day threshold, though arbitrary, forms the basis of our current republishing strategy. Of course, records will still likely be cached by relays long after being evicted. So maybe republishing is not so much needed once there is several robusts relays in the network.
Potential Approaches: Homeserver vs. Client-Side Republishing
1. Homeserver-Driven Republishing (Alternative Approach)
A natural approach would be for the homeserver itself to handle republishing. Since the homeserver is always online, it could maintain a copy of each user's pkarr record and ensure that these records are kept alive on the DHT. However, this approach faces significant challenges:
Risk of Rate Limiting/Blocking by DHT Peers:
A homeserver hosting many users would need to republish frequently, potentially triggering rate limits or blocks from DHT peers.
Need for Activity Tracking:
To avoid excessive republishing, a homeserver would need to implement smart logic, for example:
Complexity and Overhead:
Tracking user activity and implementing these smart republishing policies introduces additional complexity and state management on the homeserver side.
Redundancy with Client-Initiated Flows:
If the publishing flow is already triggered by the user during signin, it raises the question of whether the homeserver should handle it at all.
While a homeserver should certainly republish its own pkarr record to remain discoverable, the above challenges make it less ideal for handling user record republishing.
2. Client-Side Republishing (Proposed Approach)
We propose shifting the responsibility for user record republishing to the client. This approach offers several advantages:
Distributed Publishing Load:
Since republishing occurs on the client side during signin, the load is naturally distributed, avoiding the risk of a single homeserver being rate-limited.
Seamless User Experience:
Republishing can occur transparently in the background when the user signs in, requiring no additional user intervention.
Minimized DHT Traffic:
The client will use a conditional republishing strategy (
IfOlderThan
), ensuring republishing only occurs if:Simplicity for the Homeserver:
The homeserver no longer needs to track user pkarr records or manage republishing logic, reducing operational complexity.
Proposed Implementation Details
On Signup:
signup()
method publishes a new pkarr record immediately, ensuring instant discoverability for new users. It disregards any existing record except for the purpose of CaS and race conditions.On Signin (Background Republishing):
extract_host_from_record(...)
.publish_homeserver
using theIfOlderThan
strategy, only republishing if the record is missing or older than 4 days.Public Method for Explicit Republishing:
republish_homeserver(keypair, host)
is provided for key management applications.IfOlderThan
strategy ensures that unnecessary DHT spam is avoided.Rationale for the Proposed Approach
Efficiency:
By aligning republishing with the signin process, we avoid introducing additional network operations, leveraging a flow that users trigger naturally.
Reduced Risk of DHT Rate Limiting:
Client-driven republishing avoids a central point of republishing, making it less likely for DHT peers to block or limit requests.
Simplicity and Maintainability:
Homeservers remain focused on their primary responsibilities, without needing to track user activity or implement rate-limiting workarounds.
Cons of the Proposed Approach
While client-side republishing during signin offers several advantages, it also comes with certain drawbacks:
Potential Eviction of Long-Inactive Records:
If a user does not sign in for an extended period, their pkarr record may be evicted from the DHT and become difficult for others to resolve—especially if no pkarr relay has cached it.
Impact on Low-Activity Use Cases:
This limitation may not significantly affect applications where user activity is crucial anyway (e.g., private messaging, social networks). However, it could be problematic for use cases where users are expected to sign in once, publish content, and rarely return (e.g., publishing boards or archival applications). In such scenarios, content could become hard to find unless users actively refresh their records.
Need for Future Discussion:
Addressing the discoverability of long-inactive records may require additional strategies. For example, scheduled republishing mechanisms or homeserver-assisted approaches might need to be revisited for these specific use cases.
By no means does the current client-side republishing on signin and explicit command fully resolve the republishing challenge. However, it represents a significant step forward, allowing client users to more easily ensure their records remain active and discoverable on the DHT.
The text was updated successfully, but these errors were encountered: