Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: create backup target: create cluster session: gocql: unable to create session: unable to discover protocol version: dial tcp :0->10.12.11.254:9142: connect: connection refused during backup nemesis #4206

Open
1 of 2 tasks
cezarmoise opened this issue Jan 13, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@cezarmoise
Copy link

cezarmoise commented Jan 13, 2025

Packages

Scylla version: 2024.2.3-20250108.931ce203dcf5 with build-id 8612dedae0090301c8dfd7bd937671874aa68fb3
Scylla Manager version: 3.4.1-0.20250107.48d43ab3e

Kernel Version: 5.15.0-1076-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Looks related to #4079

2025-01-12 08:15:59.894: (DisruptionEvent Severity.ERROR) period_type=end event_id=93b3a891-6c5a-4bdd-b9d0-3db73a59aea8 duration=27s: nemesis_name=MgmtBackupSpecificKeyspaces target_node=Node longevity-100gb-4h-2024-2-db-node-983f1d24-4 [54.145.15.0 | 10.12.11.254] errors=Encountered an error on sctool command: backup -c bda64310-9cff-4e4c-bfd7-abd9b50268a5 --keyspace keyspace1  --location s3:manager-backup-tests-us-east-1 : Encountered a bad command exit code!
Command: 'sudo sctool backup -c bda64310-9cff-4e4c-bfd7-abd9b50268a5 --keyspace keyspace1  --location s3:manager-backup-tests-us-east-1 '
Exit code: 1
Stdout:
Stderr:
Error: create backup target: create cluster session: gocql: unable to create session: unable to discover protocol version: dial tcp :0->10.12.11.254:9142: connect: connection refused
Trace ID: Xu456FRUS_euwBzkM5v0_w (grep in scylla-manager logs)
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1131, in run
res = self.manager_node.remoter.sudo(f"sctool {cmd}")
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/base.py", line 123, in sudo
return self.run(cmd=cmd,
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 614, in run
result = _run()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 72, in inner
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 605, in _run
return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 538, in _run_execute
result = connection.run(**command_kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 625, in run
return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 660, in _complete_run
raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: 'sudo sctool backup -c bda64310-9cff-4e4c-bfd7-abd9b50268a5 --keyspace keyspace1  --location s3:manager-backup-tests-us-east-1 '
Exit code: 1
Stdout:
Stderr:
Error: create backup target: create cluster session: gocql: unable to create session: unable to discover protocol version: dial tcp :0->10.12.11.254:9142: connect: connection refused
Trace ID: Xu456FRUS_euwBzkM5v0_w (grep in scylla-manager logs)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5294, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 2916, in disrupt_mgmt_backup_specific_keyspaces
self._mgmt_backup(backup_specific_tables=True)
File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_events/group_common_events.py", line 534, in wrapper
return decorated_func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_events/group_common_events.py", line 519, in inner_func
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3072, in _mgmt_backup
mgr_task = mgr_cluster.create_backup_task(location_list=[location, ], keyspace_list=non_test_keyspaces)
File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 596, in create_backup_task
res = self.sctool.run(cmd=cmd, parse_table_res=False)
File "/home/ubuntu/scylla-cluster-tests/sdcm/mgmt/cli.py", line 1134, in run
raise ScyllaManagerError(f"Encountered an error on sctool command: {cmd}: {ex}") from ex
sdcm.mgmt.common.ScyllaManagerError: Encountered an error on sctool command: backup -c bda64310-9cff-4e4c-bfd7-abd9b50268a5 --keyspace keyspace1  --location s3:manager-backup-tests-us-east-1 : Encountered a bad command exit code!
Command: 'sudo sctool backup -c bda64310-9cff-4e4c-bfd7-abd9b50268a5 --keyspace keyspace1  --location s3:manager-backup-tests-us-east-1 '
Exit code: 1
Stdout:
Stderr:
Error: create backup target: create cluster session: gocql: unable to create session: unable to discover protocol version: dial tcp :0->10.12.11.254:9142: connect: connection refused
Trace ID: Xu456FRUS_euwBzkM5v0_w (grep in scylla-manager logs)

logs at that trace ID

Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.938Z","N":"backup","M":"Generating backup target","cluster_id":"bda64310-9cff-4e4c-bfd7-abd9b50268a5","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.946Z","N":"cluster.client","M":"Checking hosts connectivity","hosts":["10.12.10.110","10.12.10.119","10.12.10.199","10.12.11.254","10.12.9.154","10.12.9.176"],"_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Host check OK","host":"10.12.9.176","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Host check OK","host":"10.12.10.199","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Host check OK","host":"10.12.10.119","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Host check OK","host":"10.12.9.154","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Host check OK","host":"10.12.10.110","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Host check OK","host":"10.12.11.254","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"cluster.client","M":"Done checking hosts connectivity","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:58 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:58.947Z","N":"backup","M":"Checking accessibility of remote locations","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.203Z","N":"backup","M":"Location check OK","host":"10.12.10.119","location":"s3:manager-backup-tests-us-east-1","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.226Z","N":"backup","M":"Location check OK","host":"10.12.9.176","location":"s3:manager-backup-tests-us-east-1","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.228Z","N":"backup","M":"Location check OK","host":"10.12.10.110","location":"s3:manager-backup-tests-us-east-1","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.243Z","N":"backup","M":"Location check OK","host":"10.12.11.254","location":"s3:manager-backup-tests-us-east-1","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.250Z","N":"backup","M":"Location check OK","host":"10.12.9.154","location":"s3:manager-backup-tests-us-east-1","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.281Z","N":"backup","M":"Location check OK","host":"10.12.10.199","location":"s3:manager-backup-tests-us-east-1","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.282Z","N":"backup","M":"Done checking accessibility of remote locations","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.284Z","N":"cluster","M":"Get session","cluster_id":"bda64310-9cff-4e4c-bfd7-abd9b50268a5","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.284Z","N":"cluster","M":"Creating new Scylla HTTP client","cluster_id":"bda64310-9cff-4e4c-bfd7-abd9b50268a5","_trace_id":"Xu456FRUS_euwBzkM5v0_w"}
Jan 12 08:15:59 longevity-100gb-4h-2024-2-monitor-node-983f1d24-1 scylla-manager[7905]: {"L":"INFO","T":"2025-01-12T08:15:59.295Z","N":"cluster.client","M":"Measuring datacenter latencies","dcs":["us-east"],"_trace_id":"Xu456FRUS_euwBzkM5v0_w"}

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Also 3 more times here
https://argus.scylladb.com/tests/scylla-cluster-tests/ee111e7b-a2fc-47bf-bdd0-d12a9fbab3b3

Installation details

Cluster size: 6 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • longevity-100gb-4h-2024-2-db-node-983f1d24-9 (34.207.118.134 | 10.12.11.147) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-8 (54.225.44.220 | 10.12.8.57) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-7 (204.236.249.251 | 10.12.10.199) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-6 (44.222.202.250 | 10.12.9.154) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-5 (54.221.81.211 | 10.12.10.119) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-4 (54.145.15.0 | 10.12.11.254) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-3 (54.226.90.250 | 10.12.11.77) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-2 (54.242.165.1 | 10.12.10.110) (shards: 14)
  • longevity-100gb-4h-2024-2-db-node-983f1d24-1 (54.82.96.228 | 10.12.9.176) (shards: 14)

OS / Image: ami-08db3369214fc3796 (NO RUNNER: NO RUNNER)

Test: longevity-100gb-4h-test
Test id: 983f1d24-0d5a-4e2d-9c1a-62b04957bfad
Test name: enterprise-2024.2/longevity/longevity-100gb-4h-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 983f1d24-0d5a-4e2d-9c1a-62b04957bfad
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 983f1d24-0d5a-4e2d-9c1a-62b04957bfad

Logs:

Jenkins job URL
Argus

@cezarmoise cezarmoise added the bug Something isn't working label Jan 13, 2025
@timtimb0t
Copy link

Packages

Scylla version: 6.3.0~dev-20250108.e51b2075dacc with build-id 1ffc83e51d7f78126ce77667ff1140f5f4913518

Kernel Version: 6.8.0-1020-azure

Installation details

Cluster size: 4 nodes (Standard_L16s_v3)

Scylla Nodes used in this run:

  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-7 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-6 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-5 (null | 10.0.0.14) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-4 (null | 10.0.0.8) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-3 (null | 10.0.0.7) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-2 (null | 10.0.0.6) (shards: 14)
  • longevity-tls-1tb-7d-master-db-node-6cafd6b8-eastus-1 (null | 10.0.0.5) (shards: 14)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/SCYLLA-IMAGES/providers/Microsoft.Compute/images/scylla-6.3.0-dev-x86_64-2025-01-09T11-18-00 (NO RUNNER: NO RUNNER)

Test: longevity-1tb-5days-azure-test
Test id: 6cafd6b8-6434-4948-a6fd-7b1a25a8c8cf
Test name: scylla-master/tier1/longevity-1tb-5days-azure-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 6cafd6b8-6434-4948-a6fd-7b1a25a8c8cf
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 6cafd6b8-6434-4948-a6fd-7b1a25a8c8cf

Logs:

Jenkins job URL
Argus

@timtimb0t
Copy link

Packages

Scylla version: 6.3.0~dev-20250108.e51b2075dacc with build-id 1ffc83e51d7f78126ce77667ff1140f5f4913518

Kernel Version: 6.8.0-1021-aws

Installation details

Cluster size: 6 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-9 (34.251.4.141 | 10.4.22.20) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-8 (34.241.26.106 | 10.4.20.131) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-7 (3.254.126.153 | 10.4.22.124) (shards: -1)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-6 (52.208.146.125 | 10.4.22.186) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-5 (54.247.14.114 | 10.4.22.170) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-4 (54.155.14.93 | 10.4.22.97) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-3 (54.194.32.152 | 10.4.20.122) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-2 (54.220.6.171 | 10.4.20.202) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-16 (54.76.98.223 | 10.4.23.157) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-15 (52.19.27.230 | 10.4.22.242) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-14 (54.76.156.247 | 10.4.21.108) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-13 (52.210.61.57 | 10.4.20.224) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-12 (52.212.247.74 | 10.4.21.143) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-11 (52.213.210.215 | 10.4.23.96) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-10 (54.171.113.110 | 10.4.22.53) (shards: 14)
  • longevity-tls-50gb-3d-master-db-node-2dcd4a4a-1 (54.247.50.94 | 10.4.23.234) (shards: 14)

OS / Image: ami-0419ef0a7ad763693 (NO RUNNER: NO RUNNER)

Test: longevity-50gb-3days-test
Test id: 2dcd4a4a-e69e-492e-b62c-eb5f73fd311d
Test name: scylla-master/tier1/longevity-50gb-3days-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 2dcd4a4a-e69e-492e-b62c-eb5f73fd311d
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 2dcd4a4a-e69e-492e-b62c-eb5f73fd311d

Logs:

Jenkins job URL
Argus

@VAveryanov8
Copy link
Collaborator

I think what's happening in this issue is:

  1. Due to missing native_transport_port_ssl: 9142 parameter in scylla.yaml, ScyllaDB doesn't listen on 9142 port
  2. Scylla manager correctly detects that ssl is enabled and tries to use CQL SSL port (9142). Previously, scylla-manager could incorrectly use 9042 (see Scylla Manager, under certain condition, is unable to use only SSL port (9142) to restore data #4079)
  3. As ScyllaDB is not listening on 9142 port, scylla manager gets dial tcp :0->10.12.11.254:9142: connect: connection refused error

So issue can be fixed by adding native_transport_port_ssl: 9142 parameter to scylla.yaml.

On the other hand, it's interesting why scylla-manager tries to use 9142 when it is not set in scylla.yaml? As far as I know, scylla-manager gets configuration parameters from Scylla Rest API, so I suspect ScyllaDB might be returning 9142 port even if it's not listening on it. Note: it's just a theory and needs validation.

@Michal-Leszczynski what do you think about this?

@Michal-Leszczynski
Copy link
Collaborator

@VAveryanov8 you are right in terms of what happened!

There is this really old issue that the Scylla API used for querying the config returns the default value for the unspecified option. So disabling SSL port and setting it to the default value looks the same from SM POV. That's also the sole reason why we needed to add the following flags to the sctool cluster add command:

      --force-non-ssl-session-port   Forces Scylla Manager to always use the non-SSL port for TLS-enabled cluster CQL sessions.
      --force-tls-disabled           Forces Scylla Manager to always disable TLS for the cluster's CQL session, even if TLS is enabled in scylla.yaml.

Enabling SSL and disabling the SSL port is supported by Scylla:

# Enabling native transport encryption in client_encryption_options allows you to either use
# encryption for the standard port or to use a dedicated, additional port along with the unencrypted
# standard native_transport_port.
# Enabling client encryption and keeping native_transport_port_ssl disabled will use encryption
# for native_transport_port. Setting native_transport_port_ssl to a different value
# from native_transport_port will use encryption for native_transport_port_ssl while
# keeping native_transport_port unencrypted.
#native_transport_port_ssl: 9142

So we shouldn't force to enable the SSL port in this test, but rather to include the --force-non-ssl-session-port flag when adding the cluster to SM with sctool cluster add command.
cc: @timtimb0t @cezarmoise

But in general, it would be nice if the original issue gets fixed.
I'm only worried that after this many years, SM/Operator/Cloud could somehow rely on this behavior, but those are just speculations.

@dimakr
Copy link

dimakr commented Jan 14, 2025

Reproduced for 2024.1.15 patch release (scylla version 2024.1.15-20250112.ae485295dcda with build id 2cdf61613e37f62bdf0ce4d78ca2bc4b0b859c9c) in test runs longevity-100gb-4h-fips-test, longevity-50gb-3days-test, longevity-150gb-asymmetric-cluster-12h-test

@fruch
Copy link
Contributor

fruch commented Jan 26, 2025

@VAveryanov8 you are right in terms of what happened!

There is this really old issue that the Scylla API used for querying the config returns the default value for the unspecified option. So disabling SSL port and setting it to the default value looks the same from SM POV. That's also the sole reason why we needed to add the following flags to the sctool cluster add command:

      --force-non-ssl-session-port   Forces Scylla Manager to always use the non-SSL port for TLS-enabled cluster CQL sessions.
      --force-tls-disabled           Forces Scylla Manager to always disable TLS for the cluster's CQL session, even if TLS is enabled in scylla.yaml.

Enabling SSL and disabling the SSL port is supported by Scylla:

# Enabling native transport encryption in client_encryption_options allows you to either use
# encryption for the standard port or to use a dedicated, additional port along with the unencrypted
# standard native_transport_port.
# Enabling client encryption and keeping native_transport_port_ssl disabled will use encryption
# for native_transport_port. Setting native_transport_port_ssl to a different value
# from native_transport_port will use encryption for native_transport_port_ssl while
# keeping native_transport_port unencrypted.
#native_transport_port_ssl: 9142
So we shouldn't force to enable the SSL port in this test, but rather to include the `--force-non-ssl-session-port` flag when adding the cluster to SM with `sctool cluster add` command. cc: [@timtimb0t](https://github.com/timtimb0t) [@cezarmoise](https://github.com/cezarmoise)

But in general, it would be nice if the original issue gets fixed. I'm only worried that after this many years, SM/Operator/Cloud could somehow rely on this behavior, but those are just speculations.

@Michal-Leszczynski it's all good any nice, but now all of the setup we have that enable CQL TLS, are not working with manager (and they were working just fine in previous releases of manager), I would say this is a regression of manager that we should flush out

it forever had a default in scylla (and commented out in the yaml), and we never set it in SCT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants