-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dpu]: Add DPU database service #17161
Conversation
Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
/azpw run |
/AzurePipelines run |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -2,7 +2,7 @@ | |||
"INSTANCES": { | |||
"redis":{ | |||
"hostname" : "{{HOST_IP}}", | |||
"port" : 6379, | |||
"port" : {{REDIS_PORT}}, | |||
"unix_socket_path" : "/var/run/redis{{NAMESPACE_ID}}/redis.sock", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DPU should use different unix socket path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Ze Gan <[email protected]>
Why do we have /var/run/redisdpu0/sonic-db/database_global.json? |
Please create a separate PR for sonic yang models |
a8cceca
to
87a9b2c
Compare
Removed. |
87a9b2c
to
330ac8a
Compare
Signed-off-by: Ze Gan <[email protected]>
330ac8a
to
7aa84c9
Compare
Signed-off-by: Ze Gan <[email protected]>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Ze Gan <[email protected]>
|
||
|
||
def get_num_dpus(): | ||
num_dpus = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add todo here. we should use platform api to get the dpu number instead of rely on the platform env config. @prgeor .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -32,7 +32,8 @@ RUN apt-get clean -y && \ | |||
s/^# unixsocket/unixsocket/; \ | |||
s/redis-server.sock/redis.sock/g; \ | |||
s/^client-output-buffer-limit pubsub [0-9]+mb [0-9]+mb [0-9]+/client-output-buffer-limit pubsub 0 0 0/; \ | |||
s/^notify-keyspace-events ""$/notify-keyspace-events AKE/ \ | |||
s/^notify-keyspace-events ""$/notify-keyspace-events AKE/; \ | |||
s/^databases [0-9]+$/databases 100/ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change expected? do we need 100 databases or it is changed for testing but later forget to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of databases is 16 which isn't enough for new databases of DPU.
@@ -94,6 +95,30 @@ | |||
"separator": ":", | |||
"instance" : "redis" | |||
} | |||
{% if DATABASE_TYPE is defined and DATABASE_TYPE == "dpudb" %} | |||
, | |||
"DPU_APPL_DB" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe i missed some update, so just checking - have we decided to use DPU_APPL_DB or DASH_APPL_DB and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I want to introduce new databases like, DPU_APPL_DB
, which follows chassis design. I feel it's clearer to keep the underlay and overlay objects in different databases.
{% for dpu in range(dpu_cnt) %} | ||
{ | ||
"database_type" : "dpudb", | ||
"include" : "../../redisdpu{{dpu}}/sonic-db/database_config.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i wonder if we could use "redis-dpu{{dpu}}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we can do it in another change, since this change is already approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not easy. Because the database folder consists of the prefix "redis" and suffix "$DEV" without the dash "-". This convention has been used in the multi-ASIC scenario.
"port" : {{REDIS_PORT}}, | ||
"unix_socket_path" : "/var/run/redis{{DEV}}/redis.sock", | ||
"persistence_for_warm_boot" : "yes", | ||
"database_type": "{{DATABASE_TYPE}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a check to add the field "database_type" if DATABASE_TYPE is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -59,7 +71,7 @@ if [[ $DATABASE_TYPE == "chassisdb" ]]; then | |||
fi | |||
|
|||
# copy/generate the database_global.json file if this is global database service in multi asic platform. | |||
if [[ $NAMESPACE_ID == "" ]] && [[ $NAMESPACE_COUNT -gt 1 ]] | |||
if [[ $DATABASE_TYPE == "" ]] && [[ $NAMESPACE_COUNT -gt 1 || $NUM_DPU -gt 1 ]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this check $NAMESPACE_ID == "" changed here to use $DATABASE_TYPE ? This check originally meant we are working in the host namespace where namespace is ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the DPU scenario, all dpu databases are running in the host namespace which means the namespace_id is also empty to the dpu database instance. So, I use database_type == "" to identify the host db instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand in case of dpu we set the DATABASE_TYPE as below, and for all other cases like single_asic/multi-asic DATABASE_TYPE will be ""
The earlier check below was applicable only to global db which is in host namespace in case of multi-asic/single asic
if [[ $NAMESPACE_ID == "" ]] && [[ $NAMESPACE_COUNT -gt 1 ]]
So now with this change $NAMESPACE_ID == ""
--------> $DATABASE_TYPE == ""
, for multi-asic platforms it will never go to the else part as DATABASE_TYPE is always "" ?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix it in PR: #17221 Please help to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look.
Signed-off-by: Ze Gan <[email protected]>
df3c296
to
d26601f
Compare
… in multi asic platform (#17221) Fix bug: #17161 (comment) multi-asic platforms it will never go to the else part as DATABASE_TYPE is always "" Microsoft ADO (number only): 25072889 Move the checker NAMESPACE_ID == "" back Signed-off-by: Ze Gan <[email protected]>
Sub PRs:
sonic-net/sonic-host-services#84
#17191
Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.
Work item tracking
How I did it
To follow the multiple ASIC design, I assume a new platform environment variable
NUM_DPU
will be defined in the/usr/share/sonic/device/$PLATFORM/platform_env.conf
. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.How to verify it
Defined
NUM_DPU
in the platform_env.confDeclare a new field in the config_db for database feature.
remove the existing databse service by 'docker rm -f database' and reboot OS.
We should get multiple database services and global_config as following.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)