Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayCluster][Feature] add redis username to head pod from GcsFaultToleranceOptions #2760

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

win5923
Copy link
Contributor

@win5923 win5923 commented Jan 16, 2025

Why are these changes needed?

This PR addresses the following selected part:
image

Currently, the example YAML file for GCS FT has Redis version 5.0.9, which does not support ACL. This causes a CrashLoopBackOff when setting redisUsername.

image
Ref: https://redis.io/docs/latest/operate/oss_and_stack/management/security/acl/

Manual Tests

kind: ConfigMap
apiVersion: v1
metadata:
  name: redis-config
  labels:
    app: redis
data:
  redis.conf: |-
    dir /data
    port 6379
    bind 0.0.0.0
    appendonly yes
    protected-mode no
    requirepass 5241590000000000
    pidfile /data/redis-6379.pid

    user username on >5241590000000000 ~* +@all
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: redis
spec:
  type: ClusterIP
  ports:
    - name: redis
      port: 6379
  selector:
    app: redis
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7.4.2 <--- Use the version that supports ACL.
          command:
            - "sh"
            - "-c"
            - "redis-server /usr/local/etc/redis/redis.conf"
          ports:
            - containerPort: 6379
          volumeMounts:
            - name: config
              mountPath: /usr/local/etc/redis/redis.conf
              subPath: redis.conf
      volumes:
        - name: config
          configMap:
            name: redis-config
---
# Redis password
apiVersion: v1
kind: Secret
metadata:
  name: redis-password-secret
type: Opaque
data:
  # echo -n "username" | base64
  # echo -n "5241590000000000" | base64
  username: dXNlcm5hbWU=
  password: NTI0MTU5MDAwMDAwMDAwMA==
---
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-external-redis-3
spec:
  rayVersion: 'nightly'
  gcsFaultToleranceOptions:
    redisAddress: redis:6379
    redisUsername:
      valueFrom:
        secretKeyRef:
          name: redis-password-secret
          key: username
    redisPassword:
      valueFrom:
        secretKeyRef:
          name: redis-password-secret
          key: password
  headGroupSpec:
    rayStartParams:
      num-cpus: "0"
    template:
      spec:
        containers:
          - name: ray-head
            image: rayproject/ray:nightly
            resources:
              limits:
                cpu: "1"
              requests:
                cpu: "1"
            ports:
              - containerPort: 6379
                name: redis
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
            volumeMounts:
              - mountPath: /tmp/ray
                name: ray-logs
              - mountPath: /home/ray/samples
                name: ray-example-configmap
        volumes:
          - name: ray-logs
            emptyDir: {}
          - name: ray-example-configmap
            configMap:
              name: ray-example
              defaultMode: 0777
              items:
                - key: detached_actor.py
                  path: detached_actor.py
                - key: increment_counter.py
                  path: increment_counter.py
  workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 10
      groupName: small-group
      rayStartParams: {}
      template:
        spec:
          containers:
            - name: ray-worker
              image: rayproject/ray:nightly
              volumeMounts:
                - mountPath: /tmp/ray
                  name: ray-logs
              resources:
                limits:
                  cpu: "1"
                requests:
                  cpu: "1"
          volumes:
            - name: ray-logs
              emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-example
data:
  detached_actor.py: |
    import ray

    @ray.remote(num_cpus=1)
    class Counter:
      def __init__(self):
          self.value = 0

      def increment(self):
          self.value += 1
          return self.value

    ray.init(namespace="default_namespace")
    Counter.options(name="counter_actor", lifetime="detached").remote()
  increment_counter.py: |
    import ray

    ray.init(namespace="default_namespace")
    counter = ray.get_actor("counter_actor")
    print(ray.get(counter.increment.remote()))

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: ray-example
data:
  detached_actor.py: |
    import ray

    @ray.remote(num_cpus=1)
    class Counter:
      def __init__(self):
          self.value = 0

      def increment(self):
          self.value += 1
          return self.value

    ray.init(namespace="default_namespace")
    Counter.options(name="counter_actor", lifetime="detached").remote()
  increment_counter.py: |
    import ray

    ray.init(namespace="default_namespace")
    counter = ray.get_actor("counter_actor")
    print(ray.get(counter.increment.remote()))

image
image

Related issue number

Resolves #2720

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@win5923 win5923 marked this pull request as draft January 16, 2025 14:38
@win5923 win5923 force-pushed the redis/username branch 2 times, most recently from f224524 to b786001 Compare January 16, 2025 14:45
@win5923 win5923 marked this pull request as ready for review January 16, 2025 16:24
@win5923
Copy link
Contributor Author

win5923 commented Jan 16, 2025

@rueian PTAL when you are free

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RayCluster][Feature] add GcsFaultToleranceOptions to the RayCluster CRD [2/N]
1 participant