From 2ff1b8c6d90e844ce8f8caf3d0c571e86b8fdffb Mon Sep 17 00:00:00 2001 From: Xiangying Meng <55571188+liangyepianzhou@users.noreply.github.com> Date: Wed, 18 Oct 2023 15:41:01 +0800 Subject: [PATCH] [improve][pip] PIP-302 Introduce refreshAsync API for TableView (#21271) ## Motivation **Prerequisite:** Since messages are constantly being written into the Topic and there is no read-write lock guarantee, we cannot assure the retrieval of the most up-to-date value. **Implementation Goal:** Record a checkpoint before reading and ensure the retrieval of the latest value of the key up to this checkpoint. **Use Case:** When read and write operations for a certain key do not occur simultaneously, we can refresh the TableView before reading the key to obtain the latest value for this key. --- pip/pip-302.md | 140 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 pip/pip-302.md diff --git a/pip/pip-302.md b/pip/pip-302.md new file mode 100644 index 0000000000000..cf721ee38559b --- /dev/null +++ b/pip/pip-302.md @@ -0,0 +1,140 @@ +# Background Knowledge + +The TableView interface provides a convenient way to access the streaming updatable dataset in a topic by offering a continuously updated key-value map view. The TableView retains the last value of the key which provides you with an almost up-to-date dataset but cannot guarantee you always get the latest data (with the latest written message). + +The TableView can be used to establish a local cache of data. Additionally, clients can register consumers with TableView and specify a listener to scan the map and receive notifications whenever new messages are received. This functionality enables event-driven applications and message monitoring. + +For more detailed information about the TableView, please refer to the [Pulsar documentation](https://pulsar.apache.org/docs/next/concepts-clients/#tableview). + +# Motivation + +When a TableView is created, it retrieves the position of the latest written message and reads all messages from the beginning up to that fetched position. This ensures that the TableView will include any messages written prior to its creation. However, it does not guarantee that the TableView will include any newly added messages during its creation. +Therefore, the value you read from a TableView instance may not be the most recent value, but you will not read an older value once a new value becomes available. It's important to note that this guarantee is not maintained across multiple TableView instances on the same topic. This means that you may receive a newer value from one instance first, and then receive an older value from another instance later. +In addition, we have several other components, such as the transaction buffer snapshot and the topic policies service, that employ a similar mechanism to the TableView. This is because the TableView is not available at that time. However, we cannot replace these implementations with a TableView because they involve multiple TableView instances across brokers within the same system topic, and the data read from these TableViews is not guaranteed to be up-to-date. As a result, subsequent writes may occur based on outdated versions of the data. +For example, in the transaction buffer snapshot, when a broker owns topics within a namespace, it maintains a TableView containing all the transaction buffer snapshots for those topics. It is crucial to ensure that the owner can read the most recently written transaction buffer snapshot when loading a topic (where the topic name serves as the key for the transaction buffer snapshot message). However, the current capabilities provided by TableView do not guarantee this, especially when ownership of the topic is transferred and the TableView of transaction buffer snapshots in the new owner broker is not up-to-date. + +Regarding both the transaction buffer snapshot and topic policies service, updates to a key are only performed by a single writer at a given time until the topic's owner is changed. As a result, it is crucial to ensure that the last written value of this key is read prior to any subsequent writing. By guaranteeing this, all subsequent writes will consistently be based on the most up-to-date value. + +The proposal will introduce a new API to refresh the table view with the latest written data on the topic, ensuring that all subsequent reads are based on the refreshed data. + +```java +tableView.refresh(); +tableView.get(“key”); +``` + +After the refresh, it is ensured that all messages written prior to the refresh will be available to be read. However, it should be noted that the inclusion of newly added messages during or after the refresh is not guaranteed. + +# Goals + +## In Scope + +Providing the capability to refresh the TableView to the last written message of the topic and all the subsequent reads to be conducted using either the refreshed dataset or a dataset that is even more up-to-date than the refreshed one. + +## Out of Scope + + +A static perspective of a TableView at a given moment in time +Read consistency across multiple TableViews on the same topic + +# High-Level Design + +Provide a new API for TableView to support refreshing the dataset of the TableView to the last written message. + +## Design & Implementation Details + +# Public-Facing Changes + +## Public API + +The following changes will be added to the public API of TableView: + +### `refreshAsync()` + +This new API retrieves the position of the latest written message and reads all messages from the beginning up to that fetched position. This ensures that the TableView will include any messages written prior to its refresh. + +```java +/** +* +* Refresh the table view with the latest data in the topic, ensuring that all subsequent reads are based on the refreshed data. +* +* Example usage: +* +* table.refreshAsync().thenApply(__ -> table.get(key)); +* +* This function retrieves the last written message in the topic and refreshes the table view accordingly. +* Once the refresh is complete, all subsequent reads will be performed on the refreshed data or a combination of the refreshed +* data and newly published data. The table view remains synchronized with any newly published data after the refresh. +* +* |x:0|->|y:0|->|z:0|->|x:1|->|z:1|->|x:2|->|y:1|->|y:2| +* +* If a read occurs after the refresh (at the last published message |y:2|), it ensures that outdated data like x=1 is not obtained. +* However, it does not guarantee that the values will always be x=2, y=2, z=1, as the table view may receive updates with newly +* published data. +* +* |x:0|->|y:0|->|z:0|->|x:1|->|z:1|->|x:2|->|y:1|->|y:2| -> |y:3| +* +* Both y=2 or y=3 are possible. Therefore, different readers may receive different values, but all values will be equal to or newer +* than the data refreshed from the last call to the refresh method. +*/ +CompletableFuture refreshAsync(); + +/** +* Refresh the table view with the latest data in the topic, ensuring that all subsequent reads are based on the refreshed data. +* +* @throws PulsarClientException if there is any error refreshing the table view. +*/ +void refresh() throws PulsarClientException; + + +``` + +# Monitoring + +The proposed changes do not introduce any specific monitoring considerations at this time. + +# Security Considerations + +No specific security considerations have been identified for this proposal. + +# Backward & Forward Compatibility + +## Revert + +No specific revert instructions are required for this proposal. + +## Upgrade + +No specific upgrade instructions are required for this proposal. + +# Alternatives + +## Add consistency model policy to TableView +Add new option configuration `STRONG_CONSISTENCY_MODEL` and `EVENTUAL_CONSISTENCY_MODEL` in TableViewConfigurationData. +• `STRONG_CONSISTENCY_MODEL`: any method will be blocked until the latest value is retrieved. +• `EVENTUAL_CONSISTENCY_MODEL`: all methods are non-blocking, but the value retrieved might not be the latest at the time point. + +However, there might be some drawbacks to this approach: +1. As read and write operations might happen simultaneously, we cannot guarantee consistency. If we provide a configuration about consistency, it might confuse users. +2. This operation will block each get operation. We need to add more asynchronous methods. +3. Less flexibility if users don’t want to refresh the TableView for any reads. + +## New method for combining the refresh and get + +Another option is to add new methods for the existing methods to combine the refresh and reads. For example + +CompletableFuture refreshGet(String key); + +It will refresh the dataset of the TableView and perform the get operation based on the refreshed dataset. But we need to add 11 new methods to the public APIs of the TableView. + + +# General Notes + +No additional general notes have been provided. + +# Links + + +* Mailing List discussion thread: +* Mailing List voting thread: