-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose Solution for Efficient Sync with Many Delete Entries #2066
Comments
The ultimate goal of any optimizations to the sync process is to enable a client to create data and make it available to the server in the shortest possible time, while also reducing both time and storage requirements by syncing fewer entries. Clients may have varying synchronization requirements, which can be addressed through the following strategies: 1. Clients Concerned Only with New State Behavior: Such a client doesn’t need to be aware of any previous state. It should have the flexibility to either operate independently of the server's current state or sync with the server without retrieving any existing keys. Benefit: This approach allows the client to create data that is immediately available to the server without the overhead of syncing old or irrelevant data. 2. Initial Sync vs. Delta Sync Delta Sync: After a client has performed an initial sync, it can use delta syncs to receive only changes (additions, updates, deletions) since the last sync. Sync Type Flag: Introducing a syncType: initial|delta flag in the sync operation would enable the server to optimize syncs by not sending deleted entries during the initial sync. This would help the client get in sync faster, allowing it to start creating data that is immediately available to the server. Edge Case: If the last entry in the sync process is a deletion, the client might fall out of sync by one entry. This scenario should be managed gracefully to ensure synchronization integrity. 3. Skipping Expired Keys During the Sync Edge Case: As with deletions, if the last entry to sync is a deletion of an expired key, the client may become out of sync by one entry. Handling this scenario is crucial to maintaining consistent synchronization. Technical Implementation
|
A client can have four types of synchronization requirements: Always-Online Clients: These clients need to be continuously connected to the remote secondary for their operations. Their primary focus is on reading from and writing to the server directly, without relying on cached data. These clients do not require synchronization to access cached data as they do not depend on it. Ex: SSHNoports code has completely disable the sync and put and get talks to remote secondary directly
atClientGenerator: (SshnpdParams p) => createAtClientCli(
/// Parameters that application code can optionally provide when calling /// Parameters that application code can optionally provide when calling /// Whether to send this update request directly to the remote atServer
await atClient.put(key, params.toJson(), putRequestOptions: options); Push-Only Clients: These clients are only concerned with sending new data to the server and do not care about any previous state. If they go offline, they can queue requests and push them to the server when they reconnect. Full Sync Clients: These clients need to fetch data from the server before performing any operations. Applications like Buzz and Wavi typically fall into this category. The default sync behavior in the atClient SDK is designed to cater to these clients. Sync Requirement: These clients must complete synchronization before they can push any new data to the server. Selective Sync Clients: These clients require only a specific subset of data from the server before creating new data and syncing it back. For example, if a client starts with no data and only needs keys key1, key2, and key5 from the server, it will sync those keys and then proceed to interact with the server using that limited dataset. When developing an application, it’s crucial to understand your client's sync requirements, as each type can significantly impact performance. |
Actionable Next Steps Based on the Analysis: Enable the Feature to Exclude Commit Log Entries for Expired Key Deletions on the Server:
Introduce the Sync Type Flag:
Analyze Sync Issues with atcolin:
Conduct an Architectural Discussion to Evaluate the Need for Direct Support in SDK for:
|
Analyze Sync Issues with atcolin. @purnimavenkatasubbu can you start on this, please? |
Is your feature request related to a problem? Please describe.
The current synchronization process for atServer is facing inefficiencies, particularly when handling large numbers of deletions. This impacts the performance and scalability of the system. This ticket's objective is to propose a solution to improve synchronization efficiency in scenarios involving numerous delete entries, ensuring the solution is scalable and independent of Hive.
Current Design Overview:
CRUD Operations:
Data Storage: atServer stores data as key-value pairs.
Key Management: Keys can be created, deleted, or automatically expired using the ttl (time to live) parameter.
Expired Key Cleanup: A cron job deletes expired keys.
Key Storage: All keys are stored in a Hive box named KeyStore.
Commit Log:
Operation Logging: Key creation or updates are logged in a Hive box called CommitLog with an auto-generated sequence number.
Recording Changes: Each operation is recorded with a new sequence number.
Single Entry per Key: The CommitLog maintains one entry per unique key.
In-Memory Compact CommitLog:
In-Memory Representation: atServer keeps an in-memory map of the CommitLog to optimize synchronization.
Sync Efficiency: This map supports efficient synchronization operations.
Sync Process:
Client Connections: Multiple clients can be connected to an atServer.
Data Synchronization: Clients sync data with the atServer, which assigns a commit ID. Clients record this ID locally.
Sync Status: A data item with a server commit ID indicates it is synced.
Managing Sync Differences: Clients must update their local commit ID before pushing new data if their ID is lower than the server's latest ID.
Current Design Issues:
Inefficient Sync with Many Deletions:
Excessive Syncing of Deleted Keys: New clients must sync all keys, including numerous deletions, leading to significant time and space inefficiencies.
Impact on Sync Performance: Syncing a large number of deletions consumes bandwidth and processing resources, reducing overall efficiency.
Inefficiencies in Key Expiry: Clients that created expired keys also sync deletions, even though they could manage these locally.
Describe the solution you'd like
Propose a Solution for Efficient Sync with Many Delete Entries: Scalable and Hive-Agnostic
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: