Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dpu]: Add DPU database service #17161

Merged
merged 7 commits into from
Nov 17, 2023
Merged

Conversation

Pterosaur
Copy link
Contributor

@Pterosaur Pterosaur commented Nov 14, 2023

Sub PRs:
sonic-net/sonic-host-services#84
#17191

Why I did it

According to the design, the database instances of DPU will be kept in the NPU host.

Work item tracking
  • Microsoft ADO (number only): 25072889

How I did it

To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.

How to verify it

  1. Defined NUM_DPU in the platform_env.conf
    image

  2. Declare a new field in the config_db for database feature.
    image

  3. remove the existing databse service by 'docker rm -f database' and reboot OS.

We should get multiple database services and global_config as following.
image

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
@Pterosaur Pterosaur requested a review from r12f November 14, 2023 04:39
@Pterosaur
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@Pterosaur
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur requested a review from judyjoseph November 15, 2023 01:08
@@ -2,7 +2,7 @@
"INSTANCES": {
"redis":{
"hostname" : "{{HOST_IP}}",
"port" : 6379,
"port" : {{REDIS_PORT}},
"unix_socket_path" : "/var/run/redis{{NAMESPACE_ID}}/redis.sock",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DPU should use different unix socket path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Ze Gan <[email protected]>
@ganglyu
Copy link
Contributor

ganglyu commented Nov 15, 2023

Why do we have /var/run/redisdpu0/sonic-db/database_global.json?

@ganglyu
Copy link
Contributor

ganglyu commented Nov 15, 2023

Please create a separate PR for sonic yang models

@Pterosaur Pterosaur force-pushed the dpu_database branch 2 times, most recently from a8cceca to 87a9b2c Compare November 15, 2023 12:04
@Pterosaur
Copy link
Contributor Author

Why do we have /var/run/redisdpu0/sonic-db/database_global.json?

Removed.

Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
@Pterosaur
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Ze Gan <[email protected]>


def get_num_dpus():
num_dpus = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add todo here. we should use platform api to get the dpu number instead of rely on the platform env config. @prgeor .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lguohan
lguohan previously approved these changes Nov 16, 2023
@@ -32,7 +32,8 @@ RUN apt-get clean -y && \
s/^# unixsocket/unixsocket/; \
s/redis-server.sock/redis.sock/g; \
s/^client-output-buffer-limit pubsub [0-9]+mb [0-9]+mb [0-9]+/client-output-buffer-limit pubsub 0 0 0/; \
s/^notify-keyspace-events ""$/notify-keyspace-events AKE/ \
s/^notify-keyspace-events ""$/notify-keyspace-events AKE/; \
s/^databases [0-9]+$/databases 100/ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change expected? do we need 100 databases or it is changed for testing but later forget to remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of databases is 16 which isn't enough for new databases of DPU.

@@ -94,6 +95,30 @@
"separator": ":",
"instance" : "redis"
}
{% if DATABASE_TYPE is defined and DATABASE_TYPE == "dpudb" %}
,
"DPU_APPL_DB" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i missed some update, so just checking - have we decided to use DPU_APPL_DB or DASH_APPL_DB and etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I want to introduce new databases like, DPU_APPL_DB, which follows chassis design. I feel it's clearer to keep the underlay and overlay objects in different databases.

{% for dpu in range(dpu_cnt) %}
{
"database_type" : "dpudb",
"include" : "../../redisdpu{{dpu}}/sonic-db/database_config.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i wonder if we could use "redis-dpu{{dpu}}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can do it in another change, since this change is already approved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy. Because the database folder consists of the prefix "redis" and suffix "$DEV" without the dash "-". This convention has been used in the multi-ASIC scenario.

"port" : {{REDIS_PORT}},
"unix_socket_path" : "/var/run/redis{{DEV}}/redis.sock",
"persistence_for_warm_boot" : "yes",
"database_type": "{{DATABASE_TYPE}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a check to add the field "database_type" if DATABASE_TYPE is defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -59,7 +71,7 @@ if [[ $DATABASE_TYPE == "chassisdb" ]]; then
fi

# copy/generate the database_global.json file if this is global database service in multi asic platform.
if [[ $NAMESPACE_ID == "" ]] && [[ $NAMESPACE_COUNT -gt 1 ]]
if [[ $DATABASE_TYPE == "" ]] && [[ $NAMESPACE_COUNT -gt 1 || $NUM_DPU -gt 1 ]]
Copy link
Contributor

@judyjoseph judyjoseph Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check $NAMESPACE_ID == "" changed here to use $DATABASE_TYPE ? This check originally meant we are working in the host namespace where namespace is ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DPU scenario, all dpu databases are running in the host namespace which means the namespace_id is also empty to the dpu database instance. So, I use database_type == "" to identify the host db instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand in case of dpu we set the DATABASE_TYPE as below, and for all other cases like single_asic/multi-asic DATABASE_TYPE will be ""

The earlier check below was applicable only to global db which is in host namespace in case of multi-asic/single asic

if [[ $NAMESPACE_ID == "" ]] && [[ $NAMESPACE_COUNT -gt 1 ]]

So now with this change $NAMESPACE_ID == "" --------> $DATABASE_TYPE == "" , for multi-asic platforms it will never go to the else part as DATABASE_TYPE is always "" ?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I will fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it in PR: #17221 Please help to check it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look.

Signed-off-by: Ze Gan <[email protected]>
@lguohan lguohan merged commit 9f08f88 into sonic-net:master Nov 17, 2023
20 checks passed
lguohan pushed a commit that referenced this pull request Nov 21, 2023
… in multi asic platform (#17221)

Fix bug: #17161 (comment)
multi-asic platforms it will never go to the else part as DATABASE_TYPE is always ""


Microsoft ADO (number only): 25072889

Move the checker NAMESPACE_ID == "" back

Signed-off-by: Ze Gan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants