Skip to content

Async FLASH Design

John Sully edited this page Aug 8, 2023 · 1 revision

Async IStorage Interface

Problem Statement

KeyDB SSD performance is highly limited by its global lock design. Modern SSDs work on a pipeline architecture where they are able to process multiple I/O requests in parallel. Because I/O requests happen during the execution of a command while the global lock is acquired KeyDB cannot issue more than one I/O request at a time. This design seeks to improve the number of I/O requests currently in flight by moving the IStorage from a blocking API to an asynchronous API.

Out Of Scope

This work will initially focus only on O(1) commands. Ensuring O(n) commands like KEYS work is an additional effort that will not be tracked by this design; However, it is intended that such work will build upon the infrastructure implemented for single I/O commands.

In addition it is possible multiple commands from the same client could be in flight simultaneously if they do not conflict. The work to do this is out of scope of this design but may be completed later to further improve the performance of a single client.

Interface Changes

virtual void insert(const char *key, size_t cchKey, void *data, size_t cb, bool fOverwire) = 0;
virtual bool erase(const char *key, size_t cchKey) = 0;
virtual void retrieve(const char *key, size_t cchKey, callbackSingle fn) const = 0;

The above IStorage APIs will be deleted and replaced with a two part API. The first part will begin the I/O operation and return a token object which the client can later use to complete the operation. It is intended that the server will complete work for other clients between initiating the I/O and completing the I/O operation.

virtual StorageToken insert(aeEventLoop *el, aePostFunctionProc *callback, const char *key, size_t cchKey, void *data, size_t cb, bool fOverwire) = 0;
virtual void completeInsert(const StorageToken &tok) = 0;
virtual StorageToken erase(aeEventLoop *el, aePostFunctionProc *callback, const char *key, size_t cchKey) = 0;
virtual void completeErase(const StorageToken &tok) = 0;
virtual StorageToken retrieve(aeEventLoop *el, aePostFunctionProc *callback, const char *key, size_t cchKey) const = 0;
virtual void completeRetrieve(const StorageToken &tok, callbackSingle fn) const = 0;

StorageToken object

struct StorageToken {
    enum class TokenType {
        SingleRead,
        SingleWrite,
        Delete,
    };


    TokenType type;
    aePostFunctionProc *callback;
    client *c;
};

All asynchronous APIs will return a StorageToken object. The token object will have the same base class for all APIs however a specific implementation of IStorage may have different sub-classes for each API that inherit from the common base class.

The token object will contain bookkeeping information and will be passed to an I/O completion API to get the final result of the underlying operation. The token will be directly supported by the AE event loop architecture allowing notification when the underlying I/O event has completed. Because the token object is associated with a specific event loop, it must always be used on the same thread as the AE event loop it is associated with.

The buffer token will contain the client with the pending command, it is intended that the client struct will store sufficient information to complete the high level command that triggered the associated I/O operation.

AE Event Loop Integration

The AE event loop will be modified to understand the StorageToken object and provide a callback when the StorageToken is ready. When the I/O event is ready the IStorage implementation will use the aePostFunction() interface to queue a function call on the calling thread.

The aePostFunction will updated to support a third function type:

typedef void aePostFunctionProc(aeEventLoop *el, StorageToken *token);

The caller of the I/O function may specific a unique callback function for the call. The callback function must run the corresponding “complete” function for the API that started the I/O.

Client Pausing

Because we are now splitting up command execution into multiple parts we must delay client client execution. Redis already has infrastructure to support this which we will utilize and extend.

The blockClient() function will be called by the initiator of the I/O operation. This will be called with a new btype called BLOCKED_IO. The I/O callback function is expected to call unblockClient once all pending I/O operations for the command are completed. This means that we must call the completion function even in the event of an I/O error that prevents the actual I/O from completing properly. The completion function may throw in this situation.