Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLD for Redis Client Manager #1810

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/mgmt/images/rcm_client.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/mgmt/images/rcm_init.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/mgmt/images/rcm_transactional_client.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
176 changes: 176 additions & 0 deletions doc/mgmt/redis_client_manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Redis Client Manager \(RCM\) HLD #

## Table of Contents
- [Table of Contents](#table-of-contents)
- [Revision](#revision)
- [Scope](#scope)
- [Definition/Abbreviation](#definitionsabbreviations)
- [Overview](#overview)
- [Requirements](#requirements)
- [Architecture Design](#architecture-design)
- [High Level Design](#high-level-design)
- [SAI API](#sai-api)
- [Configuration and Management](#configuration-and-management)
- [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact)
- [Memory Consumption](#memory-consumption)
- [Restrictions/Limitations](#restrictionslimitations)
- [Testing Requirements/Design](#testing-requirementsdesign)
- [Open/Action items - if any](#openaction-items---if-any)

### Revision
| Rev | Rev Date | Author(s) | Change Description |
|------|------------|--------------------|--------------------|
| v0.1 | 09/13/2024 | Lucas Rodriguez (Google) | Initial version |

### Scope

This document describes the high-level design using Redis connection pools to connect to Redis from within sonic-mgmt-common. It describes the singleton object that manages these connections and the integration of these connection pools into the existing framework.

### Definitions/Abbreviations

- Redis: An open-source, in-memory data store that's used as a cache, database, and message broker.
- RCM: Redis Client Manager.
- Redis Client: A type defined in go-redis that is used to interact with the Redis databases.
- Redis Transaction: The execution of a group of Redis commands in a single step.
- go-redis: The [Redis API](https://pkg.go.dev/github.com/go-redis/redis/v8) used in sonic-mgmt-common.
- RPC: Remote Procedure Call, used to interface with sonic-mgmt-common.
- CVL: [Config Validation Library](https://github.com/sonic-net/SONiC/blob/master/doc/mgmt/Management%20Framework.md#3228-config-validation-library-cvl)

### Overview

The translib framework that lives in sonic-mgmt-common is responsible for carrying out various config and telemetry related tasks, such as serving gNMI GETs, SETs, and SUBSCRIBEs. In order to do this, translib creates and uses many Redis Clients to read/write the Redis databases. Redis Clients support connection pooling, which consists of keeping a pool of active connections that can be used instead of creating a new one each time one is needed. Connection pools significantly reduce the number of connections made to Redis and reduce stress on the Redis TCP/Unix sockets. Since the connection pools are held and managed by a Redis Client object, these Redis Client objects must be reused instead of being closed after each use.

### Requirements

- Redis Clients with a connection pool must be reused wherever possible instead of creating a new Redis Client each time one is needed, within sonic-mgmt-common.
- Transactional Redis operations must be handled correctly. Connection pools do not support transactions.
- The number of Redis Clients and connections to Redis must be tracked for the life of the server.

### Architecture Design

This design does not change the existing SONiC Architecture. This design adds a new API that is meant to be a replacement for redis.NewClient and redis.Close.

### High-Level Design

This design introduces a new API that manages all Redis connections within sonic-mgmt-common called Redis Client Manager (RCM). With a central manager, Redis Clients can be kept open after an RPC completes, reused, and usage statistics can be tracked. This will also abstract most of the Redis logic away from the user and avoid misuse of the go-redis library. Since the RCM will be the only place in sonic-mgmt-common where new Redis Clients are created, any client/connection creation failure can be exposed here. By pinging a newly created client, UMF can identify and log any connection failures.

The new API will expose four new functions that will replace [redis.NewClient](https://pkg.go.dev/github.com/go-redis/redis/v8#NewClient) and [redis.Close](https://pkg.go.dev/github.com/go-redis/redis/v8#Client.Close).

#### RedisClient(db DBNum)

This RedisClient function takes in a database number, defined in [db.go](https://github.com/sonic-net/sonic-mgmt-common/blob/master/translib/db/db.go#L136), and returns an existing Redis Client to the caller. This Redis Client can be shared across multiple goroutines and has a connection pool. It cannot be used for [Transactional Redis Operations](https://redis.io/docs/latest/develop/interact/transactions/) (such as MULTI, EXEC, PSUBSCRIBE, etc...).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today redis clients are created by db.NewDB(), which is called per translib API (Get, Create, Delete etc). Each translib API handling will involve multiple redis operations of different kinds. All of them are using the same redis client created during db.NewDB() call today. How do you propose to predict the kind of redis operations that will be performed by the app code during db.NewDB() call itself -- which will done by the translib infra?

Are you planning to use IsWriteDisabled flag? If yes, please mention so instead of listing redis operations.


This function is implemented using a cache of Redis Clients that are initialized during the Go init process. One client is created for each DBNum and stored in a map. Initially, each Redis Client will only have 1 active connection in the connection pool but the number of connections in the pool can grow up to a specified max number of connections as needed. The max pool size is set to 20, and the rest of the Redis options are taken from the [existing API](https://github.com/sonic-net/sonic-mgmt-common/blob/master/translib/db/db_redis_opts.go) for these options. This init process looks like this:

![RCM Init](images/rcm_init.png)

When a Redis Client is requested through this new function, the Redis Client is taken from the cache and returned to the caller. Multiple callers can use a given Redis Client at a time and the number of connections in the pool will increase to satisfy simulatenous Redis operations through the same client. All of the connection pool management is handled by go-redis, with more details [here](https://pkg.go.dev/github.com/go-redis/redis/v8#Options). This process looks like this:

![RCM Clients](images/rcm_client.png)

#### TransactionalRedisClient(db DBNum)

The TransactionalRedisClient function takes in a database number, defined in [db.go](https://github.com/sonic-net/sonic-mgmt-common/blob/master/translib/db/db.go#L136), creates a new Redis Client for that database with a PoolSize of 1, and returns it to the caller. This Redis Client is not shared with any other client and it can be used for [Transactional Redis Operations](https://redis.io/docs/latest/develop/interact/transactions/) (such as MULTI, EXEC, PSUBSCRIBE, etc...). This client must be closed after it is done being used, by calling CloseRedisClient.

No initialization is required to support this function. A new client is created using redis.NewClient when this function is called. This process looks like this:

![RCM Transactional Client](images/rcm_transactional_client.png)

#### TransactionalRedisClientWithOpts(opts *redis.Options)

The TransactionalRedisClientWithOpts function takes in [redis.Options](https://pkg.go.dev/github.com/go-redis/redis/v8#Options) struct, creates a new Redis Client with those options and a PoolSize of 1, and returns it to the caller. This Redis Client is not shared with any other client and it can be used for [Transactional Redis Operations](https://redis.io/docs/latest/develop/interact/transactions/) (such as MULTI, EXEC, PSUBSCRIBE, etc...). This client must be closed after it is done being used, by calling CloseRedisClient. The purpose of this function is to provide a more custom client to the caller if necessary.

The implementation of this function is very similar to TransactionalRedisClient.

#### CloseRedisClient(client *redis.Client)

The CloseRedisClient function takes in a [redis.Client](https://pkg.go.dev/github.com/go-redis/redis/v8#Client) object and closes the client if necessary. For shared Redis Clients that use a connection pool, this function is a no-op because those clients must stay open. For unique Redis Clients fetched with TransactionalRedisClient or TransactionalRedisClientWithOpts, this function will close the client. This is necessary to prevent a leak of these clients.

The function is implemented by calling [redis.Close](https://pkg.go.dev/github.com/go-redis/redis/v8#Client.Close) on the client if the pool size of that Client is equal to 1.

#### Counters

RCM keeps counters for the number of requests for Redis Clients. The counters are updated on calls to the functions described above. This is why it is important to refactor all call sites to use these new functions. The counters include:
- curTransactionalClients: The number of active transactional clients that were provided by RCM.
- totalPoolClientsRequested: The number of pool clients that were requested through RCM.
- totalTransactionalClientsRequested: The number of transactional clients that were requested through RCM.

Additionally go-redis has internal counters for Connection Pools that can be accessed for each Redis Client in the cache. More details are provided [here](https://pkg.go.dev/github.com/go-redis/redis/[email protected]/internal/pool#Stats).

The counters can be fetched through another function provided by RCM, called RedisClientManagerCounters().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider integrating with the existing stats APIs -- https://github.com/sonic-net/sonic-mgmt-common/blob/master/translib/db/db_stats.go


#### Command Line Flag

The connection pools feature can be disabled through a new boolean command line flag: "use_connection_pools"

This flag is set to true by default, enabling connection pools.

When "use_connection_pools" is false, all calls to RedisClient() will just call TransactionalRedisClient(). This will return a unique Redis Client to each caller and bypass the pool feature completely.

#### Refactoring Callsites

In all places where a Redis Client is not used to perform a transactional operation, the call will be refactored to use RedisClient(). This will result in the caller now using a shared Redis Client with a connection pool. This will significantly reduce the number of new Redis Clients and connections to Redis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please list the callsites you are planning to change.. Today the DB object and its redis client are created once per translib API call and not by the apps. How the "kind of redis operations" performed by apps are predicted by the infra?


In all places where a transactional operation is performed, the call will be refactored to use TransactionalRedisClient(). This will give the caller a new Redis Client that is unique. If more options need to be specified, the call will be refactored to use TransactionalRedisClientWithOpts().

All calls to redis.Close() will be replaced with calls to CloseRedisClient(). This will ensure that Redis Clients that use connection pools are not closed.

### SAI API

This HLD does not include any changes to the SAI API.

### Configuration and Management
Although this change interacts with configuration and management, it should be a transparent change. There is no change in functionality. The purpose of this HLD is to reduce the stress of configuration and management on Redis.

#### CLI/YANG model Enhancements

This HLD does not include any changes to CLI or YANG models.

#### Config DB Enhancements

This HLD does not include any changes to the Config DB.

### Warmboot and Fastboot Design Impact

This HLD does not have any implications on Warmboot/Fastboot.

### Memory Consumption

Increased memory usage from this HLD should be minimal. Increased memory comes from the 10 (DBNum) Redis pool clients that are created during initialization and stay open for the life of the server. There are also new counters that are being introduced that live in memory.

### Restrictions/Limitations

RCM lives in the translib/db package. This package imports CVL, so refactoring calls in CVL to use RCM would create a circular dependency.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be resolved by placing the RCM code in a separate package parallel to translib & cvl


### Testing Requirements/Design

Unit tests are implemented to test RCM's functionality. Since callsites are updated to use RCM, many existing tests provide additional coverage for this feature.

#### Unit Test cases

- TestCreateRedisClient
- Tests the internal RCM function that creates new Redis Clients.
- TestRedisClient
- Tests the exposed RCM function that returns a Redis Client with a connection pool, given a DBNum.
- TestTransactionalRedisClient
- Tests the exposed RCM function that returns a new Redis Client, given a DBNum.
- TestTransactionalRedisClientWithOpts
- Tests the exposed RCM function that returns a new Redis Client, given a redis.Options object.
- TestCloseRedisClient
- Tests the exposed RCM function that closes the given Redis Client, if necessary.
- TestIsTransactionalClient
- Tests the function that returns whether or not a given Redis Client is a transactional one or not.
- TestConnectionPoolDisable
- Tests the command line flag that can be used to enable/disable Redis connection pools.
- TestNilRCM
- Tests the scenario where RCM is nil or never intialized.
- TestRedisCounters
- Tests that the RCM counters are updated correctly given different scenarios.
- TestRedisClientManagerCounters
- Tests the exposed RCM function that returns the counters held by RCM.
- TestRedisClientPoolExhaustion
- Tests the scenario where more requests than connections in a given Redis Client's connection pool are received.

### Open/Action items - if any

RCM can be used across other Go repositories, such as sonic-gnmi and sonic-mgmt-framework. This will further reduce the stress on Redis from those repositories.