Skip to content

Chassis DB Consistency Test Plan

JunhongMao edited this page Aug 3, 2023 · 6 revisions

Chassis DB Consistency Test Plan

Revision

Rev Date Author Change Description
0.1 08/02/2023 Junhong Mao Initial Draft

1. Introduction

This test plan is to check the functionalities of chassis database consistency in the case of one or more Line-cards being pulled out, etc.

The test environment is a Nokia 7250 IXR-10e Interconnect Routers.

2. Test Cases

2.1 Database availability checking

2.1.1 Steps

When all line cards such as ixre-egl-board40 and ixre-egl-board41 were plugged in and worked normally, log in the chassis CPM board, such as ixre-cpm-chassis15.

Verify the database by below shell script db-con.sh in the supervisor

$ cat db-con.sh
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_NEIGH|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_INTERFACE|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_MEMBER_TABLE|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_TABLE|ixre-egl-board40*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_TABLE|ixre-egl-board40"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_NEIGH|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_INTERFACE|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_MEMBER_TABLE|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_TABLE|ixre-egl-board41*"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_TABLE|ixre-egl-board41"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_SET"
redis-dump -H 10.6.0.100 -p 6380 -d 12 -y -k "SYSTEM_LAG_ID_TABLE"

2.1.2 Pass/Fail Criteria

Pass

If the content are valid in the below format:

{
  "SYSTEM_NEIGH|ixre-egl-board40|asic0|Ethernet-IB0|3.3.3.1": {
    "expireat": 1690815616.4330785,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "encap_index": "1074790404",
      "neigh": "40:7c:7d:bb:26:15"
    }
  },
  ...... 

Fail

If the contents are empty

2.2 Database consistency when one or more Line-cards reboot

2.2.1 Steps

Reboot one or more Line-cards by using the command on Line-cards

sudo reboot

Verify the database by db-con.sh in the supervisor to see if the related contents were cleaned up as part of this reboot process.

2.2.2 Pass/Fail Criteria

Pass

If the contents were cleaned up during booting and became valid later

The valid content is like the below format. The contents are empty if they were cleaned up.

{
  "SYSTEM_NEIGH|ixre-egl-board40|asic0|Ethernet-IB0|3.3.3.1": {
    "expireat": 1690815616.4330785,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "encap_index": "1074790404",
      "neigh": "40:7c:7d:bb:26:15"
    }
  },
  ...... 
  

Fail

Otherwise

2.3 Database consistency when one or more Line-cards were pulled out

2.3.1 Steps

Pulled out one or more Line-cards.

(1) Verify the database by db-con.sh in the supervisor before 30 minutes and after 30 minutes

(2) Verify the related syslog by the below command

tail -f /var/log/syslog

2.3.2 Pass/Fail Criteria

The valid content is like the below format.

{
  "SYSTEM_NEIGH|ixre-egl-board40|asic0|Ethernet-IB0|3.3.3.1": {
    "expireat": 1690815616.4330785,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "encap_index": "1074790404",
      "neigh": "40:7c:7d:bb:26:15"
    }
  },
  ...... 

The contents are empty if they were cleaned up.

The sample of syslog is below:

Aug  1 20:41:49.069227 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module LINE-CARD0|ixre-egl-board40 is down for long time. Initiating chassis app db clean up
Aug  1 20:41:49.083447 ixre-cpm-chassis15 NOTICE pmon#chassisd: Cleaned up chassis app db entries for LINE-CARD0(ixre-egl-board40)/asic0
Aug  1 20:41:49.095707 ixre-cpm-chassis15 NOTICE pmon#chassisd: Cleaned up chassis app db entries for LINE-CARD0(ixre-egl-board40)/asic1

Pass

If the related contents were valid before 30 minutes and were cleaned up when 30 minutes were due.

Fail

Otherwise

2.4 Database consistency when one or more Line-card's midplane interface connectivity was lost

2.4.1 Steps

To simulate the midplane connectivity loss, by the below commands:

admin@ixre-egl-board40:~$ sudo ifconfig eth1-midplane down

Shut down one or more Line-cards' midplane interface, the Line-card will reboot itself after 60 seconds due to it's unable to reach CPM. The message like below will be logged in syslog.

ixre-egl-board40 login: 23-08-02 20:51:21.826 sr_device_mgr: Rebooting - Unable to reach CPM. Reboot self.

During the Line-card booting process, the database will be clear-up.

After booting process, the database will be filled.

Then check the syslog( by tail) and database (by db-con.sh).

2.4.2 Pass/Fail Criteria

Pass

(1) if there is no database clean-up within 60 seconds after the midplane connectivity’s loss

(2) and there is database clean-up during the Line-card boot process after the midplane connectivity’sloss

(3) and the database become normal after the Line-card boot up.

otherwise

2.5 Database consistency when one or more Line-cards boot into ONIE mode

2.5.1 Steps

Reboot one or more Line-card by using the below command in the console session other than the ssh session.

~ sudo reboot

In the GNU Grub Menu, select ONIE as below.

 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
 ³                                                                            ³
 ³                                                                            ³
 ³*SONiC-OS-msft-2205-ndk.0-dirty-20230726.220238                             ³ 
 ³ SONiC-OS-msft-2205-ndk.0-dirty-20230723.234148                             ³
 ³ ONIE                                                                       ³

By this mean, the default image, such as SONiC-OS-msft-2205-ndk.0-dirty-20230726.220238 will not be boot.

(1) Verify the database by db-con.sh in the supervisor during 30 minutes and after 30 minutes

(2) Verify the related syslog by below command

tail -f /var/log/syslog

2.5.2 Pass/Fail Criteria

Pass

If the database was not cleaned until 30 minutes later, pass.

Fail

Otherwise