Skip to content

Instance failed to start, channel at capacity (32) #7983

Closed
@leftwo

Description

@leftwo

I have some Instances stuck in starting/stopping. All on sled 14, sled

root@oxz_switch0:~# omdb db instance list 2> /dev/null | grep -v running | grep -v stopped                                                                                     
ID                                   STATE    PROPOLIS_ID                          SLED_ID                              HOST_SERIAL NAME                      
051a3617-ec9a-4401-9916-afd3a99be85b starting 315a55c4-a549-4fe2-878d-90408fc3eda4 34a3ed99-5aa2-4828-b4b9-116e6a1767d6 BRM42220026 aaa12-inst    
da2d563b-24dc-455c-863a-aecdc5154e5e starting 9c445772-584b-4a05-81c7-acbae9a816fe 34a3ed99-5aa2-4828-b4b9-116e6a1767d6 BRM42220026 new12-inst
9971a086-b519-4764-bcea-64a3bdb34f1b stopping 9f5f8062-891e-4374-ab38-e985f39cd1c1 34a3ed99-5aa2-4828-b4b9-116e6a1767d6 BRM42220026 new11-inst
b246f449-51be-4da3-86c3-31397a9418dd stopping cb921ea9-ade2-47fc-97a7-ab6f38253116 34a3ed99-5aa2-4828-b4b9-116e6a1767d6 BRM42220026 new13-inst
33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb starting e04e30ee-2a22-4989-ba39-79074c0c6d9f 34a3ed99-5aa2-4828-b4b9-116e6a1767d6 BRM42220026 aaa13-inst

Looking at aaa13-inst (the bottom one):

root@oxz_switch0:~# omdb db instance info 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb 2> /dev/null

== INSTANCE ====================================================================
                        ID: 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
                project ID: a4be3fc6-790c-4d90-b64b-c8ed32028a25
                      name: aaa13-inst
               description: aaa13-inst host
                created at: 2025-04-15 00:44:57.140357 UTC
          last modified at: 2025-04-15 00:44:57.140357 UTC

== CONFIGURATION ===============================================================
                     vCPUs: 2
                    memory: 8 GiB
                  hostname: aaa13-inst
                 boot disk: None
              auto-restart:
                  InstanceAutoRestart {
                      policy: None,
                      cooldown: None,
                  }

== RUNTIME STATE ===============================================================
               nexus state: Vmm
(i)     external API state: Starting
           last updated at: 2025-04-15T17:11:28.738765Z (generation 11)
       needs reincarnation: false
             karmic status: saṃsāra (reincarnation enabled)
      last reincarnated at: None
             active VMM ID: Some(e04e30ee-2a22-4989-ba39-79074c0c6d9f)
             target VMM ID: None
              migration ID: None
              updater lock: UNLOCKED at generation: 13

== ACTIVE VMM ==================================================================
                        ID: e04e30ee-2a22-4989-ba39-79074c0c6d9f
               instance ID: 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
                created at: 2025-04-15 17:11:46.775187 UTC
                     state: starting
                updated at: 2025-04-15T17:11:47.601450Z (generation 2)
          propolis address: fd00:1122:3344:101::1:14:12400
                   sled ID: 34a3ed99-5aa2-4828-b4b9-116e6a1767d6

== ATTACHED DISKS ==============================================================
# ID                                   SIZE   STATE    NAME       
0 536a6543-b089-4572-bde8-0b0635539a32 13 GiB attached aaa13-disk 

On 14 where propolis server is running:

BRM42220026 # /opt/oxide/oxlog/oxlog logs --current --archived --show-empty oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f | xargs ls -l
-rw-r--r--   1 root     root        2637 Apr 15 17:16 /pool/ext/03fee42e-4aa0-404c-8404-beae944ece8f/crypt/debug/oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f/system-illumos-propolis-server:default.log.1744737116
-rw-r--r--   1 root     root           0 Apr 15 17:15 /pool/ext/60403bed-a679-4ac1-82dc-8b920b974898/crypt/zone/oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f/root/var/svc/log/system-illumos-propolis-server:default.log

The propolis log looks usual, and is just waiting:

BRM42220026 # cat /pool/ext/03fee42e-4aa0-404c-8404-beae944ece8f/crypt/debug/oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f/system-illumos-propolis-server:default.log.1744737116 | looker
[ Apr 15 17:11:54 Enabled. ]
[ Apr 15 17:11:54 Rereading configuration. ]
[ Apr 15 17:11:55 Rereading configuration. ]
[ Apr 15 17:11:55 Executing start method ("/opt/oxide/lib/svc/manifest/propolis/propolis.sh"). ]
+ . /lib/svc/share/smf_include.sh
++ SMF_EXIT_OK=0
++ SMF_EXIT_NODAEMON=94
++ SMF_EXIT_ERR_FATAL=95
++ SMF_EXIT_ERR_CONFIG=96
++ SMF_EXIT_MON_DEGRADE=97
++ SMF_EXIT_MON_OFFLINE=98
++ SMF_EXIT_ERR_NOSMF=99
++ SMF_EXIT_ERR_PERM=100
++ svcprop -c -p config/datalink svc:/system/illumos/propolis-server:default
+ DATALINK=oxControlInstance20
++ svcprop -c -p config/gateway svc:/system/illumos/propolis-server:default
+ GATEWAY=fd00:1122:3344:101::1
++ svcprop -c -p config/listen_addr svc:/system/illumos/propolis-server:default
+ LISTEN_ADDR=fd00:1122:3344:101::1:14
++ svcprop -c -p config/listen_port svc:/system/illumos/propolis-server:default
+ LISTEN_PORT=12400
++ svcprop -c -p config/metric_addr svc:/system/illumos/propolis-server:default
+ METRIC_ADDR=dns
+ [[ oxControlInstance20 == unknown ]]
+ [[ fd00:1122:3344:101::1 == unknown ]]
+ ipadm delete-if oxControlInstance20
ipadm: Could not delete oxControlInstance20: Interface does not exist
+ true
+ ipadm create-if -t oxControlInstance20
+ ipadm set-ifprop -t -p mtu=9000 -m ipv4 oxControlInstance20
+ ipadm set-ifprop -t -p mtu=9000 -m ipv6 oxControlInstance20
+ ipadm show-addr oxControlInstance20/ll
ipadm: Could not get address: Object not found
+ ipadm create-addr -t -T addrconf oxControlInstance20/ll
+ ipadm show-addr oxControlInstance20/omicron6
ipadm: Could not get address: Object not found
+ ipadm create-addr -t -T static -a fd00:1122:3344:101::1:14 oxControlInstance20/omicron6
+ route get -inet6 default -inet6 fd00:1122:3344:101::1
default: not in table
+ route add -inet6 default -inet6 fd00:1122:3344:101::1
add net default: gateway fd00:1122:3344:101::1
+ args=('run' '/opt/oxide/propolis-server/blob/OVMF_CODE.fd' "[$LISTEN_ADDR]:$LISTEN_PORT" '--metric-addr' "$METRIC_ADDR")
+ ctrun -l child -o noorphan,regent /opt/oxide/propolis-server/bin/propolis-server run /opt/oxide/propolis-server/blob/OVMF_CODE.fd '[fd00:1122:3344:101::1:14]:12400' --metric-addr dns
[ Apr 15 17:11:56 Method "start" exited with status 0. ]
17:11:56.502Z INFO propolis-server: Starting server...
17:11:56.503Z INFO propolis-server: listening
    local_addr = [fd00:1122:3344:101::1:14]:12400

Search for e04e30ee, the propolis ID (zone ID) for aaa13-inst

BRM23230018 # for log in $(/opt/oxide/oxlog/oxlog logs --archived --current $(zoneadm list | grep nexus)); do echo "search $log for aaa13" >> /tmp/fff;grep aaa13-inst $log | looker >> /tmp/fff; don

Here is the stop in the nexus log:
search /pool/ext/268b43d0-37df-4956-ac61-ea92abeb975d/crypt/debug/oxz_nexus_e05c41d4-3e97-4ba9-a260-8b5bedd17b9f/oxide-nexus:default.log.1744738200 for aaa13

17:11:24.318Z DEBG e05c41d4-3e97-4ba9-a260-8b5bedd17b9f (dropshot_external): roles
   actor_id = f0356066-0677-4bac-a557-3891f02dee37
   authenticated = true
   local_addr = 172.30.2.6:443
   method = POST
   remote_addr = 172.20.2.15:40101
   req_id = c2115bc8-a87f-4611-8724-d67111793fbd
   roles = RoleSet { roles: {} }
   uri = /v1/instances/aaa13-inst/stop?project=alan

Then, the start in the nexus log on 17:

17:11:46.663Z DEBG e05c41d4-3e97-4ba9-a260-8b5bedd17b9f (dropshot_external): roles
    actor_id = f0356066-0677-4bac-a557-3891f02dee37
    authenticated = true
    local_addr = 172.30.2.6:443
    method = POST
    remote_addr = 172.20.2.15:51002
    req_id = 19ea8b0a-51c1-4574-bfa7-e7ca4bd11e4a
    roles = RoleSet { roles: {} }
    uri = /v1/instances/aaa13-inst/start?project=alan

Saga create for instance start:

17:11:46.708Z INFO e05c41d4-3e97-4ba9-a260-8b5bedd17b9f (ServerContext): preparing saga
    file = nexus/src/app/saga.rs:255
    saga_id = ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6
    saga_name = instance-start

…

Finish??

17:11:47.701Z INFO e05c41d4-3e97-4ba9-a260-8b5bedd17b9f (ServerContext): saga finished
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/steno-0.4.1/src/sec.rs:1025
    result = success
    saga_id = ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6
    saga_name = instance-start
    sec_id = e05c41d4-3e97-4ba9-a260-8b5bedd17b9f
17:11:47.701Z INFO e05c41d4-3e97-4ba9-a260-8b5bedd17b9f (ServerContext): saga finished
    file = nexus/src/app/saga.rs:404
    saga_id = ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6
    saga_name = instance-start

omdb saga output for the instance start saga.

root@oxz_switch0:~# /tmp/omdb-saga db sagas show ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6
 id                                   | time_created                   | name           | state                 
--------------------------------------+--------------------------------+----------------+-----------------------
 ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.708453 UTC | instance-start | SagaCachedState(Done) 

                             saga id | event time                     | node id                                   | event type | data
------------------------------------ | ------------------------------ | ----------------------------------------- | ---------- | ---
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.713511 UTC |  10: start                                | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.716112 UTC |  10: start                                | succeeded  | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.719560 UTC |   0: instance_start.generate_propolis_id  | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.722653 UTC |   0: instance_start.generate_propolis_id  | succeeded  | "e04e30ee-2a22-4989-ba39-79074c0c6d9f"
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.725685 UTC |   1: instance_start.alloc_server          | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.758035 UTC |   1: instance_start.alloc_server          | succeeded  | "34a3ed99-5aa2-4828-b4b9-116e6a1767d6"
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.761691 UTC |   2: instance_start.alloc_propolis_ip     | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.769357 UTC |   2: instance_start.alloc_propolis_ip     | succeeded  | "fd00:1122:3344:101::1:14"
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.772964 UTC |   3: instance_start.create_vmm_record     | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.779976 UTC |   3: instance_start.create_vmm_record     | succeeded  | {"id":"e04e30ee-2a22-4989-ba39-79074c0c6d9f","instance_id":"33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb","propolis_ip":"fd00:1122:3344:101::1:14/128","propolis_port":12400,"runtime":{"gen":1,"state":"Creating","time_state_updated":"2025-04-15T17:11:46.775187Z"},"sled_id":"34a3ed99-5aa2-4828-b4b9-116e6a1767d6","time_created":"2025-04-15T17:11:46.775187Z","time_deleted":null}
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.783131 UTC |   4: instance_start.mark_as_starting      | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.835031 UTC |   4: instance_start.mark_as_starting      | succeeded  | {"auto_restart":{"cooldown":null,"policy":null},"boot_disk_id":null,"hostname":"aaa13-inst","identity":{"description":"aaa13-inst host","id":"33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb","name":"aaa13-inst","time_created":"2025-04-15T00:44:57.140357Z","time_deleted":null,"time_modified":"2025-04-15T00:44:57.140357Z"},"memory":8589934592,"ncpus":2,"project_id":"a4be3fc6-790c-4d90-b64b-c8ed32028a25","runtime_state":{"dst_propolis_id":null,"gen":11,"migration_id":null,"nexus_state":"Vmm","propolis_id":"e04e30ee-2a22-4989-ba39-79074c0c6d9f","time_last_auto_restarted":null,"time_updated":"2025-04-15T17:11:28.738765Z"},"updater_gen":13,"updater_id":null,"user_data":[35,99,108,111,117,100,45,99,111,110,102,105,103,10,109,97,110,97,103,101,95,101,116,99,95,104,111,115,116,115,58,32,116,114,117,101,10]}
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:46.837925 UTC |   5: instance_start.dpd_ensure            | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.214648 UTC |   5: instance_start.dpd_ensure            | succeeded  | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.219940 UTC |   6: instance_start.v2p_ensure            | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.224423 UTC |   6: instance_start.v2p_ensure            | succeeded  | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.227440 UTC |   7: instance_start.ensure_registered     | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.614684 UTC |   7: instance_start.ensure_registered     | succeeded  | {"id":"e04e30ee-2a22-4989-ba39-79074c0c6d9f","instance_id":"33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb","propolis_ip":"fd00:1122:3344:101::1:14/128","propolis_port":12400,"runtime":{"gen":2,"state":"Starting","time_state_updated":"2025-04-15T17:11:47.601450025Z"},"sled_id":"34a3ed99-5aa2-4828-b4b9-116e6a1767d6","time_created":"2025-04-15T17:11:46.775187Z","time_deleted":null}
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.618591 UTC |   8: instance_start.add_virtual_resources | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.633850 UTC |   8: instance_start.add_virtual_resources | succeeded  | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.636340 UTC |   9: instance_start.ensure_running        | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.684363 UTC |   9: instance_start.ensure_running        | succeeded  | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.688496 UTC |  11: end                                  | started    | 
ffcdd938-e1da-4607-a8d1-f3c5df0ee4f6 | 2025-04-15 17:11:47.691436 UTC |  11: end                                  | succeeded  | 

Then, sled agent on 14. I'm just showing logs for "aaa13-inst" or "e04e30ee-2a22-4989-ba39-79074c0c6d9f"

Instance starting up:

2025-04-15 17:11:47.603Z INFO SledAgent/644 (InstanceManager) on BRM42220026: ensuring instance is registered
    file = sled-agent/src/instance_manager.rs:584
    hardware = InstanceHardware { properties: InstanceProperties { ncpus: InstanceCpuCount(2), memory: ByteCount(8589934592), hostname: Hostname("aaa13-inst") }, <output removed>
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    metadata = InstanceMetadata { silo_id: 17acdd30-316f-43d8-a586-23a64b0899a8, project_id: a4be3fc6-790c-4d90-b64b-c8ed32028a25 }
    migration_id = None
    propolis_addr = [fd00:1122:3344:101::1:14]:12400
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    vmm_runtime = VmmRuntimeState { state: Starting, gen: Generation(2), time_updated: 2025-04-15T17:11:47.601450025Z }
2025-04-15 17:11:47.603Z INFO SledAgent/644 (InstanceManager) on BRM42220026: registering new instance
    file = sled-agent/src/instance_manager.rs:617
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    migration_id = None
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:47.603Z INFO SledAgent/644 (InstanceManager) on BRM42220026: initializing new Instance
    file = sled-agent/src/instance.rs:1769
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    migration_id = None
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    state = InstanceInitialState { hardware: InstanceHardware { properties: InstanceProperties { ncpus: InstanceCpuCount(2), memory: ByteCount(8589934592), hostname: Hostname("aaa13-inst") } <output removed>
2025-04-15 17:11:47.603Z INFO SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request completed
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 1469
    local_addr = [fd00:1122:3344:101::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:103::4]:53416
    req_id = e1db1ec4-ffd2-4d12-b8c7-dbafae398972
    response_code = 200
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:47.683Z INFO SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request completed
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:867
    latency_us = 229
    local_addr = [fd00:1122:3344:101::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:103::4]:65332
    req_id = 3a3e5142-ea47-466d-b703-40f9ca68ff05
    response_code = 200
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:11:47.768Z INFO SledAgent/644 (InstanceManager) on BRM42220026: Configuring new Omicron zone: oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
    file = illumos-utils/src/zone.rs:302
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:47.809Z INFO SledAgent/644 (InstanceManager) on BRM42220026: Installing Omicron zone: oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
    file = illumos-utils/src/zone.rs:340
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:49.142Z INFO SledAgent/644 (InstanceManager) on BRM42220026: Profile for oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f:
    <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
    <service_bundle type="profile" name="omicron">
      <service version="1" type="service" name="system/illumos/propolis-server">
        <instance enabled="true" name="default">
          <property_group type="application" name="config">
            <propval type="astring" name="datalink" value='oxControlInstance20'/>
            <propval type="astring" name="gateway" value='fd00:1122:3344:101::1'/>
            <propval type="astring" name="listen_addr" value='fd00:1122:3344:101::1:14'/>
            <propval type="astring" name="listen_port" value='12400'/>
            <propval type="astring" name="metric_addr" value='dns'/>
          </property_group>
        </instance>
      </service>
    </service_bundle>
    file = sled-agent/src/profile.rs:34
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:49.142Z INFO SledAgent/644 (InstanceManager) on BRM42220026: Booting oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f zone
    file = illumos-utils/src/running_zone.rs:482
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    zone = oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:49.410Z WARN SledAgent/644 (InstanceManager) on BRM42220026: wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f") failed: Property not found. retry in 37.022827ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    zone = oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:49.501Z WARN SledAgent/644 (InstanceManager) on BRM42220026: wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f") failed: Property not found. retry in 60.48713ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    zone = oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:49.592Z WARN SledAgent/644 (InstanceManager) on BRM42220026: wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f") failed: Property not found. retry in 244.00409ms
    file = illumos-utils/src/svc.rs:36
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    zone = oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f

wait for service continues for a while, then this:

2025-04-15 17:11:55.500Z WARN SledAgent/644 (InstanceManager) on BRM42220026: wait for service svc:/milestone/single-user:default in zone Some("oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f") failed: Property not found. retry in 1.082165435s
    file = illumos-utils/src/svc.rs:36
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
    zone = oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:56.669Z INFO SledAgent/644 (InstanceManager) on BRM42220026: Started propolis in zone: oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f
    file = sled-agent/src/instance.rs:2320
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:11:56.729Z INFO SledAgent/644 (InstanceManager) on BRM42220026: Propolis SMF service is online
    file = sled-agent/src/instance.rs:2331
    instance_id = 33fca24f-6e8f-4e1d-9aaf-ecc8faefeeeb
    propolis_id = e04e30ee-2a22-4989-ba39-79074c0c6d9f
2025-04-15 17:12:53.302Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 59994932
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:101::21]:55219
    req_id = 16cf7cd8-bc56-47f0-9b7e-b1dcb3b7d2b3
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:13:38.591Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60002289
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:55904
    req_id = b72721d7-3de9-4603-88f3-e01092d371fe
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state

keeps going

2025-04-15 17:15:59.047Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60002002
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:101::21]:46229
    req_id = 5e9bc566-7533-436f-a723-3be65508657f
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:16:26.189Z INFO SledAgent/644 (DumpSetup-worker) on BRM42220026: Archiving 105 log files from oxz_propolis-server_e04e30ee-2a22-4989-ba39-79074c0c6d9f zone
    file = sled-agent/src/dump_setup.rs:1001
2025-04-15 17:16:39.001Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60002291
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:42128
    req_id = 5f6ff7c4-73d9-4193-b6ea-88805cb42af5
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state

keeps going with client disconnected till we get our first channel at capacity:

2025-04-15 17:23:39.865Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60002119
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:54526
    req_id = 73bbf99d-8ce3-4773-808d-44a9b95be82a
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:23:40.023Z INFO SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request completed
    error_message_external = Service Unavailable
    error_message_internal = Failed to send request to Instance: channel at capacity (32)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 126
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:103::4]:54635
    req_id = 7d2b0372-4d11-476a-8ded-237c34f4b66a
    response_code = 503
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:24:00.073Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60003144
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:101::21]:40151
    req_id = d7791bcf-d50d-4ed4-88fc-9e483984741e
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:24:00.195Z INFO SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request completed
    error_message_external = Service Unavailable
    error_message_internal = Failed to send request to Instance: channel at capacity (32)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 142
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:101::21]:55477
    req_id = 6e9bc399-44c1-445c-ac55-800186b8d007
    response_code = 503
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:24:14.433Z WARN SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request handling cancelled (client disconnected)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:801
    latency_us = 60002699
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:12a::30]:47426
    req_id = e8591ff3-e95b-4f36-865e-27a672c1d5ee
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state
2025-04-15 17:24:14.558Z INFO SledAgent/644 (dropshot (SledAgent)) on BRM42220026: request completed
    error_message_external = Service Unavailable
    error_message_internal = Failed to send request to Instance: channel at capacity (32)
    file = /home/alan/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/dropshot-0.16.0/src/server.rs:855
    latency_us = 139
    local_addr = [fd00:1122:3344:101::1]:12345
    method = GET
    remote_addr = [fd00:1122:3344:12a::30]:36786
    req_id = b7dce702-73d5-425f-a036-2463e05f0000
    response_code = 503
    uri = /vmms/e04e30ee-2a22-4989-ba39-79074c0c6d9f/state

There are a few more interleaved client disconnect messages, but it goes to be all channel at capacity messages from this point on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    expungeexpunge sled or disk issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions