[Bug]: `dstack` marks instance as `terminated` without terminating it

### Steps to reproduce

```shell
> cat .dstack.yml 
type: dev-environment
ide: vscode

> dstack apply --spot-auto -b runpod -y
```
Then wait until the run is `running` and switch off the network on `dstack-server`'s host.

### Actual behaviour

The run is marked `failed`, the instance is marked `terminated`. However, the instance actually still exists in RunPod and the user is billed for it.

### Expected behaviour

The instance is not marked `terminated` until it is actually deleted in RunPod.

### dstack version

master

### Server logs

```shell
[09:21:18] DEBUG    dstack._internal.core.services.ssh.tunnel:73 SSH tunnel failed: b'ssh: connect to host 194.68.245.18 port 22056: Network is                
                    unreachable\r\n'                                                                                                                           
I0000 00:00:1723620079.263387 1605744 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
[09:21:19] DEBUG    dstack._internal.core.services.ssh.tunnel:73 SSH tunnel failed: b'ssh: connect to host 194.68.245.18 port 22056: Network is                
                    unreachable\r\n'                                                                                                                           
           WARNING  dstack._internal.server.background.tasks.process_running_jobs:259 job(e3ec13)polite-starfish-1-0-0: failed because runner is not available 
                    or return an error,  age=0:03:00.121137                                                                                                    
           INFO     dstack._internal.server.background.tasks.process_runs:338 run(5dd434)polite-starfish-1: run status has changed RUNNING -> TERMINATING      
[09:21:21] DEBUG    dstack._internal.server.services.jobs:238 job(e3ec13)polite-starfish-1-0-0: stopping container                                             
           INFO     dstack._internal.server.services.jobs:269 job(e3ec13)polite-starfish-1-0-0: instance 'polite-starfish-1-0' has been released, new status is
                    TERMINATING                                                                                                                                
           INFO     dstack._internal.server.services.jobs:286 job(e3ec13)polite-starfish-1-0-0: job status is FAILED, reason: INTERRUPTED_BY_NO_CAPACITY       
[09:21:22] INFO     dstack._internal.server.services.runs:932 run(5dd434)polite-starfish-1: run status has changed TERMINATING -> FAILED, reason: JOB_FAILED   
[09:21:23] ERROR    dstack._internal.server.background.tasks.process_instances:763 Got exception when terminating instance polite-starfish-1-0                 
                    Traceback (most recent call last):                                                                                                         

[... long stack trace ...]
                                                                                            
                    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.runpod.io', port=443): Max retries exceeded with url:                   
                    /graphql?api_key=***** (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at     
                    0x7f5fb5a98a90>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))                                  
           INFO     dstack._internal.server.background.tasks.process_instances:773 Instance polite-starfish-1-0 terminated
```


### Additional information

I reproduced this issue on RunPod and Vast.ai but not OCI. Maybe the behavior is different for container-based and VM-based backends. On OCI, `dstack` makes many attempts at deleting the instance and only marks it `terminated` after succeeding, which is the expected behavior.

Ideally, the job also should not be marked `failed` if the connectivity issues are on `dstack-server`'s side, not on instance's side. But this condition is difficult to detect, so it is out of scope for this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: `dstack` marks instance as `terminated` without terminating it #1551

Steps to reproduce

Actual behaviour

Expected behaviour

dstack version

Server logs

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: dstack marks instance as terminated without terminating it #1551

Description

Steps to reproduce

Actual behaviour

Expected behaviour

dstack version

Server logs

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: `dstack` marks instance as `terminated` without terminating it #1551