stream mgmt: expose CANCEL_MOVE nicely #724

philpennock · 2023-03-02T15:35:16Z

Request from you-know-who via me: give nats stream a subcommand or option there-of to syntactically sugar the cancel a move:

nats req '$JS.API.ACCOUNT.STREAM.CANCEL_MOVE.${ACCT}.${STREAM}' ''

Tested and confirmed this works in both system and stream-owner accounts.

The text was updated successfully, but these errors were encountered:

jarretlavallee · 2024-02-16T17:24:36Z

We have hit a use case for this in some instances. This would be nice to have.

ripienaar · 2024-02-27T13:21:04Z

Question, how are you initiating moves? CLI also does not allow initiating moves using the move API.

The reason these 2 APIs are not in the nats cli is because they are just really badly designed and reusing data structures they should not, generally just dont like them.

So asking a bit more about how its used so I can see what I can do.

ripienaar · 2024-02-27T13:34:53Z

Ah, a stream update in the account can initiate the move. This API is really bad, I might need to look at getting some fixes into the server before I am happy to add it to the CLI. Thing like when the server isn't clustered it will just timeout etc

jarretlavallee · 2024-02-27T14:07:27Z

The common way that I have seen to move a stream is to update the placement tags.

nats stream edit testStream --tags cloud:was

ripienaar · 2024-02-28T13:29:14Z

I would simply not use this feature tbh.

If you initiate a stream move due to other things like changing the placement cluster, and you cancel the move the stream config reflects the new cluster cofig but the stream gets put back in the original cluster with now an inconsistent config vs reality.

So I am not adding this to the CLI till we revisit how this work sorry, @jarretlavallee if you could please make some issue in the server?

$ nats s edit --cluster sfo LON_MIRROR -f
$ nats req '$JS.API.ACCOUNT.STREAM.CANCEL_MOVE.one.LON_MIRROR' ''
$ nats s info LON_MIRROR
Information for Stream LON_MIRROR created 2023-02-02 07:53:52

              Replicas: 3
               Storage: File
     Placement Cluster: sfo
....
Cluster Information:

                  Name: nyc
         Cluster Group: S-R3F-uw46mpks
                Leader: n3-nyc
               Replica: n1-nyc, current, seen 88ms ago
               Replica: n2-nyc, current, seen 88ms ago

I really would not advise using this :)

jarretlavallee · 2024-02-28T14:23:04Z

That is fair. The only times I have had to use it was in response to stalled moves and other failures. It has been a recovery option in those scenarios. I understand that if it is in the CLI, it could be used operationally and cause the issues above.

ripienaar · 2025-01-30T13:59:13Z

Some other observed problems with this, after initiating a move from lon to sfo and cancelling it, some sfo servers still had the stream on disk and recovered it at start

Jan 30 13:54:12 n2-sfo nats-server[2500521]: [1] [INF]   Starting restore for stream 'ADM6CMOXUMFKRJTPGLFY5DGYUJNLQV5SGFZMXCMTWV3CKB6Z43GQ3L6C > BIG'
Jan 30 13:54:12 n2-sfo nats-server[2500521]: [1] [INF]   Restored 61,440 messages for stream 'ADM6CMOXUMFKRJTPGLFY5DGYUJNLQV5SGFZMXCMTWV3CKB6Z43GQ3L6C > BIG' in 1ms

But then appeared to delete it.

I also saw a lot of read loop warnings AFTER cancelling on the origin cluster which suggests something was still happening there after the cancel

Jan 30 13:50:35 n2-lon nats-server[2273043]: [1] [WRN] 157.245.38.117:56350 - cid:759 - Readloop processing time: 2.001583862s
Jan 30 13:50:37 n2-lon nats-server[2273043]: [1] [WRN] 157.245.38.117:56350 - cid:759 - Readloop processing time: 2.329004291s
Jan 30 13:50:41 n2-lon nats-server[2273043]: [1] [WRN] 157.245.38.117:56350 - cid:759 - Readloop processing time: 4.070602157s
Jan 30 13:50:47 n2-lon nats-server[2273043]: [1] [WRN] 157.245.38.117:56350 - cid:759 - Readloop processing time: 5.041428719s
Jan 30 13:50:50 n2-lon nats-server[2273043]: [1] [WRN] 157.245.38.117:56350 - cid:759 - Readloop processing time: 3.518233028s

ripienaar · 2025-01-30T14:01:45Z

The origin cluster logged a lot of this AFTER the cancel

Jan 30 14:00:24 n2-lon nats-server[2273043]: [1] [WRN] Catchup for stream 'ADM6CMOXUMFKRJTPGLFY5DGYUJNLQV5SGFZMXCMTWV3CKB6Z43GQ3L6C > BIG' stalled
Jan 30 14:00:27 n2-lon nats-server[2273043]: [1] [WRN] Catchup for stream 'ADM6CMOXUMFKRJTPGLFY5DGYUJNLQV5SGFZMXCMTWV3CKB6Z43GQ3L6C > BIG' stalled
Jan 30 14:00:27 n2-lon nats-server[2273043]: [1] [WRN] Catchup for stream 'ADM6CMOXUMFKRJTPGLFY5DGYUJNLQV5SGFZMXCMTWV3CKB6Z43GQ3L6C > BIG' stalled

but

Cluster Information:

                    Name: lon
           Cluster Group: S-R3F-7ELRENun
                  Leader: n2-lon
                 Replica: n1-lon, current, seen 20ms ago
                 Replica: n3-lon, current, seen 19ms ago

But info showed the cluster up to date, I think internally the stream isnt fully aware the move is cancelled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream mgmt: expose CANCEL_MOVE nicely #724

stream mgmt: expose CANCEL_MOVE nicely #724

philpennock commented Mar 2, 2023

jarretlavallee commented Feb 16, 2024

ripienaar commented Feb 27, 2024

ripienaar commented Feb 27, 2024

jarretlavallee commented Feb 27, 2024

ripienaar commented Feb 28, 2024

jarretlavallee commented Feb 28, 2024

ripienaar commented Jan 30, 2025

ripienaar commented Jan 30, 2025 •

edited

Loading

stream mgmt: expose CANCEL_MOVE nicely #724

stream mgmt: expose CANCEL_MOVE nicely #724

Comments

philpennock commented Mar 2, 2023

jarretlavallee commented Feb 16, 2024

ripienaar commented Feb 27, 2024

ripienaar commented Feb 27, 2024

jarretlavallee commented Feb 27, 2024

ripienaar commented Feb 28, 2024

jarretlavallee commented Feb 28, 2024

ripienaar commented Jan 30, 2025

ripienaar commented Jan 30, 2025 • edited Loading

ripienaar commented Jan 30, 2025 •

edited

Loading