Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-886][HELM] Support multiple celeborn clusters in the same K8s namespace #1804

Closed
wants to merge 5 commits into from

Conversation

zwangsheng
Copy link
Contributor

@zwangsheng zwangsheng commented Aug 10, 2023

What changes were proposed in this pull request?

Use built-in helm var {{ .Release.Name }} to naming kubernetes resources, default with user input celeborn

close #1762

helm install celeborn charts/celeborn
// release name is celeborn
helm install celeborn-beta charts/celeborn
// release name is celeborn-beta

Why are the changes needed?

Support multiple celeborn clusters in the same k8s namespace

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

GA and IT

@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #1804 (5494c6a) into main (1bf9399) will decrease coverage by 0.24%.
Report is 9 commits behind head on main.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1804      +/-   ##
==========================================
- Coverage   46.69%   46.45%   -0.24%     
==========================================
  Files         162      163       +1     
  Lines       10123    10135      +12     
  Branches      933      934       +1     
==========================================
- Hits         4726     4707      -19     
- Misses       5092     5117      +25     
- Partials      305      311       +6     

see 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -18,7 +18,7 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: celeborn-conf
name: {{ .Values.celebornClusterPrefix }}-conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea looks good, I'm concern with the name, maybe celebornResourcePrefix.

Another idea is to use helm release name as we discussed offline, but it requires to do investigation whether it's designed/suitable for such cases before applying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to Helm.sh#Release described, Release suits our scene.

Release
When a chart is installed, the Helm library creates a release to track that installation.
A single chart may be installed many times into the same cluster, and create many different releases. For example, one can install three PostgreSQL databases by running helm install three times with a different release name.

Instead of adding a new var in values.yml, we can use the built-in { .Release.name }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, thanks for investigating.

@zwangsheng zwangsheng force-pushed the CELEBORN-886 branch 2 times, most recently from 46ebd21 to 8f73fb5 Compare August 14, 2023 07:42
@pan3793 pan3793 changed the title [CELEBORN-886][HELM]Support multiple celeborn clusters in the same k8s namespace [CELEBORN-886][HELM] Support multiple celeborn clusters in the same K8s namespace Aug 14, 2023
@pan3793
Copy link
Member

pan3793 commented Aug 14, 2023

LGTM, was this patch tested? and should we update the docs accordingly?

@zwangsheng
Copy link
Contributor Author

zwangsheng commented Aug 14, 2023

LGTM, was this patch tested? and should we update the docs accordingly?

Local dry-run with helm install celeborn-beta charts/celeborn --dry-run , following is output:

NAME: celeborn-beta
LAST DEPLOYED: Tue Aug 15 10:29:30 2023
NAMESPACE: default
STATUS: pending-install
REVISION: 1
TEST SUITE: None
HOOKS:
MANIFEST:
---
# Source: celeborn/templates/configmap.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v1
kind: ConfigMap
metadata:
  name: celeborn-beta-conf
  labels:
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
data:
  celeborn-defaults.conf: |-
    celeborn.master.endpoints=celeborn-beta-master-0.celeborn-beta-master-svc.default.svc.cluster.local,celeborn-beta-master-1.celeborn-beta-master-svc.default.svc.cluster.local,celeborn-beta-master-2.celeborn-beta-master-svc.default.svc.cluster.local,
    celeborn.master.ha.node.0.host=celeborn-beta-master-0.celeborn-beta-master-svc.default.svc.cluster.local
    celeborn.master.ha.node.1.host=celeborn-beta-master-1.celeborn-beta-master-svc.default.svc.cluster.local
    celeborn.master.ha.node.2.host=celeborn-beta-master-2.celeborn-beta-master-svc.default.svc.cluster.local
    celeborn.master.ha.ratis.raft.server.storage.dir=/mnt/celeborn_ratis
    celeborn.worker.storage.dirs=/mnt/disk1:disktype=HDD,/mnt/disk2:disktype=HDD,/mnt/disk3:disktype=HDD,/mnt/disk4:disktype=HDD
    celeborn.application.heartbeat.timeout=120s
    celeborn.master.ha.enabled=true
    celeborn.master.metrics.prometheus.port=9098
    celeborn.metrics.enabled=true
    celeborn.push.stageEnd.timeout=120s
    celeborn.rpc.dispatcher.numThreads=4
    celeborn.rpc.io.clientThreads=64
    celeborn.rpc.io.numConnectionsPerPeer=2
    celeborn.rpc.io.serverThreads=64
    celeborn.shuffle.chunk.size=8m
    celeborn.worker.fetch.io.threads=32
    celeborn.worker.flusher.buffer.size=256K
    celeborn.worker.heartbeat.timeout=120s
    celeborn.worker.metrics.prometheus.port=9096
    celeborn.worker.monitor.disk.enabled=false
    celeborn.worker.push.io.threads=32

  celeborn-env.sh: |
    CELEBORN_MASTER_JAVA_OPTS="-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-master.out -Dio.netty.leakDetectionLevel=advanced"
    CELEBORN_MASTER_MEMORY="2g"
    CELEBORN_NO_DAEMONIZE="1"
    CELEBORN_WORKER_JAVA_OPTS="-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-worker.out -Dio.netty.leakDetectionLevel=advanced"
    CELEBORN_WORKER_MEMORY="2g"
    CELEBORN_WORKER_OFFHEAP_MEMORY="12g"
    TZ="Asia/Shanghai" 

  log4j2.xml: |-
    <?xml version="1.0" encoding="UTF-8"?>
    <!--
    ~ Licensed to the Apache Software Foundation (ASF) under one or more
    ~ contributor license agreements.  See the NOTICE file distributed with
    ~ this work for additional information regarding copyright ownership.
    ~ The ASF licenses this file to You under the Apache License, Version 2.0
    ~ (the "License"); you may not use this file except in compliance with
    ~ the License.  You may obtain a copy of the License at
    ~
    ~     http://www.apache.org/licenses/LICENSE-2.0
    ~
    ~ Unless required by applicable law or agreed to in writing, software
    ~ distributed under the License is distributed on an "AS IS" BASIS,
    ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    ~ See the License for the specific language governing permissions and
    ~ limitations under the License.
    -->
    <!--
    ~ Extra logging related to initialization of Log4j.
    ~ Set to debug or trace if log4j initialization is failing.
    -->
    <Configuration status="INFO">
        <Appenders>
            <Console name="stdout" target="SYSTEM_OUT">
                <!--
                  ~ In the pattern layout configuration below, we specify an explicit `%ex` conversion
                  ~ pattern for logging Throwables. If this was omitted, then (by default) Log4J would
                  ~ implicitly add an `%xEx` conversion pattern which logs stacktraces with additional
                  ~ class packaging information. That extra information can sometimes add a substantial
                  ~ performance overhead, so we disable it in our default logging config.
                  -->
                <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{1}: %m%n%ex"/>
            </Console>
            <RollingRandomAccessFile name="file" fileName="${env:CELEBORN_LOG_DIR}/celeborn.log"
                                     filePattern="${env:CELEBORN_LOG_DIR}/celeborn.log.%d-%i">
                <PatternLayout pattern="%d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{1}: %m%n%ex"/>
                <Policies>
                    <SizeBasedTriggeringPolicy size="200 MB"/>
                </Policies>
                <DefaultRolloverStrategy max="7">
                    <Delete basePath="${env:CELEBORN_LOG_DIR}" maxDepth="1">
                        <IfFileName glob="celeborn.log*">
                            <IfAny>
                                <IfAccumulatedFileSize exceeds="1 GB" />
                                <IfAccumulatedFileCount exceeds="10" />
                            </IfAny>
                        </IfFileName>
                    </Delete>
                </DefaultRolloverStrategy>
            </RollingRandomAccessFile>
        </Appenders>

        <Loggers>
            <Root level="INFO">
                <AppenderRef ref="stdout"/>
                <AppenderRef ref="file"/>
            </Root>
            <Logger name="org.apache.hadoop.hdfs" level="WARN" additivity="false">
                <Appender-ref ref="stdout" level="WARN" />
                <Appender-ref ref="file" level="WARN"/>
            </Logger>
        </Loggers>
    </Configuration>

  metrics.properties: >-
    *.sink.prometheusServlet.class=org.apache.celeborn.common.metrics.sink.PrometheusServlet
---
# Source: celeborn/templates/master-service.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v1
kind: Service
metadata:
  name: celeborn-beta-master-svc
  labels:
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
spec:
  type: ClusterIP
  ports:
    - port: 9097
      targetPort: 9097
      protocol: TCP
      name: celeborn-master
  clusterIP: None
  selector:
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/role: master
    app.kubernetes.io/instance: celeborn-beta
---
# Source: celeborn/templates/worker-service.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v1
kind: Service
metadata:
  name: celeborn-beta-worker-svc
  labels:
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/role: worker
    app.kubernetes.io/instance: celeborn-beta
---
# Source: celeborn/templates/master-statefulset.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: celeborn-beta-master
  labels:
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/role: master
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: celeborn
      app.kubernetes.io/instance: celeborn-beta
      app.kubernetes.io/version: "0.2.1"
      app.kubernetes.io/role: master
      app.kubernetes.io/instance: celeborn-beta
  serviceName: celeborn-beta-master-svc
  replicas: 3
  template:
    metadata:
      labels:
        app.kubernetes.io/name: celeborn
        app.kubernetes.io/instance: celeborn-beta
        app.kubernetes.io/version: "0.2.1"
        app.kubernetes.io/role: master
        app.kubernetes.io/tag: 0.1.1-6badd20
        app.kubernetes.io/instance: celeborn-beta
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - celeborn
              - key: app.kubernetes.io/role
                operator: In
                values:
                - master
            topologyKey: kubernetes.io/hostname
      securityContext:
        fsGroup: 10006
        runAsGroup: 10006
        runAsUser: 10006
      hostNetwork: false
      dnsPolicy: ClusterFirst
      initContainers:
      - name: chown-celeborn-master-volume
        image: alpine:3.18
        imagePullPolicy: Always
        securityContext:
          runAsUser: 0
        command:
        - chown
        - 10006:10006
        - /mnt/celeborn_ratis
        volumeMounts:
          - name: celeborn-master-vol-0
            mountPath: /mnt/celeborn_ratis
      containers:
      - name: celeborn
        image: "aliyunemr/remote-shuffle-service:0.1.1-6badd20"
        imagePullPolicy: Always
        command:
          - "/usr/bin/tini"
          - "--"
          - "/bin/sh"
          - '-c'
          - "until nslookup celeborn-beta-master-0.celeborn-beta-master-svc.default.svc.cluster.local && nslookup celeborn-beta-master-1.celeborn-beta-master-svc.default.svc.cluster.local && nslookup celeborn-beta-master-2.celeborn-beta-master-svc.default.svc.cluster.local && true; do echo waiting for master; sleep 2; done && exec /opt/celeborn/sbin/start-master.sh"
        resources:
            null
        ports:
          - containerPort: 9097
          - containerPort: 9098
            name: metrics
            protocol: TCP
        volumeMounts:
          - mountPath: /opt/celeborn/conf
            name: celeborn-beta-volume
            readOnly: true
          - name: celeborn-beta-master-vol-0
            mountPath: /mnt/celeborn_ratis
        env:
          - name: CELEBORN_MASTER_JAVA_OPTS
            value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-master.out -Dio.netty.leakDetectionLevel=advanced"
          - name: CELEBORN_MASTER_MEMORY
            value: "2g"
          - name: CELEBORN_NO_DAEMONIZE
            value: "1"
          - name: CELEBORN_WORKER_JAVA_OPTS
            value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-worker.out -Dio.netty.leakDetectionLevel=advanced"
          - name: CELEBORN_WORKER_MEMORY
            value: "2g"
          - name: CELEBORN_WORKER_OFFHEAP_MEMORY
            value: "12g"
          - name: TZ
            value: "Asia/Shanghai"
      terminationGracePeriodSeconds: 30 
      volumes:
        - configMap:
            name: celeborn-beta-conf
          name: celeborn-beta-volume
        - name: celeborn-beta-master-vol-0
          hostPath:
            path: /mnt/celeborn_ratis/master
            type: DirectoryOrCreate
---
# Source: celeborn/templates/worker-statefulset.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: celeborn-beta-worker
  labels:
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/name: celeborn
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/role: worker
    helm.sh/chart: celeborn-0.1.0
    app.kubernetes.io/instance: celeborn-beta
    app.kubernetes.io/version: "0.2.1"
    app.kubernetes.io/managed-by: Helm
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: celeborn
      app.kubernetes.io/instance: celeborn-beta
      app.kubernetes.io/version: "0.2.1"
      app.kubernetes.io/role: worker
      app.kubernetes.io/instance: celeborn-beta
  serviceName: celeborn-beta-worker
  replicas: 5
  template:
    metadata:
      labels:
        app.kubernetes.io/name: celeborn
        app.kubernetes.io/instance: celeborn-beta
        app.kubernetes.io/version: "0.2.1"
        app.kubernetes.io/role: worker
        app.kubernetes.io/tag: 0.1.1-6badd20
        app.kubernetes.io/instance: celeborn-beta
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - celeborn
              - key: app.kubernetes.io/role
                operator: In
                values:
                - worker
            topologyKey: kubernetes.io/hostname
      securityContext:
        fsGroup: 10006
        runAsGroup: 10006
        runAsUser: 10006
      hostNetwork: false
      dnsPolicy: ClusterFirst
      initContainers:
      - name: chown-celeborn-worker-volume
        image: alpine:3.18
        imagePullPolicy: Always
        securityContext:
          runAsUser: 0
        command:
        - chown
        - 10006:10006
        - /mnt/disk1
        - /mnt/disk2
        - /mnt/disk3
        - /mnt/disk4
        volumeMounts:
        - name: celeborn-worker-vol-0
          mountPath: /mnt/disk1
        - name: celeborn-worker-vol-1
          mountPath: /mnt/disk2
        - name: celeborn-worker-vol-2
          mountPath: /mnt/disk3
        - name: celeborn-worker-vol-3
          mountPath: /mnt/disk4
      containers:
      - name: celeborn
        image: "aliyunemr/remote-shuffle-service:0.1.1-6badd20"
        imagePullPolicy: Always
        command:
          - "/usr/bin/tini"
          - "--"
          - "/bin/sh"
          - '-c'
          - "until nslookup celeborn-beta-master-0.celeborn-beta-master-svc.default.svc.cluster.local && nslookup celeborn-beta-master-1.celeborn-beta-master-svc.default.svc.cluster.local && nslookup celeborn-beta-master-2.celeborn-beta-master-svc.default.svc.cluster.local && true; do echo waiting for master; sleep 2; done && exec /opt/celeborn/sbin/start-worker.sh"
        resources:
            null
        ports:
          - containerPort: 9096
            name: metrics
            protocol: TCP
        volumeMounts:
          - name: celeborn-beta-worker-vol-0
            mountPath: /mnt/disk1
          - name: celeborn-beta-worker-vol-1
            mountPath: /mnt/disk2
          - name: celeborn-beta-worker-vol-2
            mountPath: /mnt/disk3
          - name: celeborn-beta-worker-vol-3
            mountPath: /mnt/disk4
        env:
          - name: CELEBORN_MASTER_JAVA_OPTS
            value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-master.out -Dio.netty.leakDetectionLevel=advanced"
          - name: CELEBORN_MASTER_MEMORY
            value: "2g"
          - name: CELEBORN_NO_DAEMONIZE
            value: "1"
          - name: CELEBORN_WORKER_JAVA_OPTS
            value: "-XX:-PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:gc-worker.out -Dio.netty.leakDetectionLevel=advanced"
          - name: CELEBORN_WORKER_MEMORY
            value: "2g"
          - name: CELEBORN_WORKER_OFFHEAP_MEMORY
            value: "12g"
          - name: TZ
            value: "Asia/Shanghai"
      terminationGracePeriodSeconds: 30
      volumes:
        - configMap:
            name: celeborn-beta-conf
          name: celeborn-beta-volume
        - name: celeborn-beta-worker-vol-0
          hostPath:
            path: /mnt/disk1/worker
            type: DirectoryOrCreate
        - name: celeborn-beta-worker-vol-1
          hostPath:
            path: /mnt/disk2/worker
            type: DirectoryOrCreate
        - name: celeborn-beta-worker-vol-2
          hostPath:
            path: /mnt/disk3/worker
            type: DirectoryOrCreate
        - name: celeborn-beta-worker-vol-3
          hostPath:
            path: /mnt/disk4/worker
            type: DirectoryOrCreate
---
# Source: celeborn/templates/prometheus-podmonitor.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

NOTES:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

Celeborn

As for user guide doc, i'd make following PR to update Deploy Celeborn On K8s

@pan3793
Copy link
Member

pan3793 commented Aug 14, 2023

The volume name looks need to update.

@zwangsheng
Copy link
Contributor Author

zwangsheng commented Aug 14, 2023

The volume name looks need to update.

The name of HostPath and EmptyDir are only valid in pod scope and are not applied for as kubernetes resource.

We can ignore that, but for the sake of seriousness, we can just call it with release name.

@zwangsheng
Copy link
Contributor Author

Updated local dry-run output.

@@ -40,12 +40,15 @@ spec:
- {{ .Release.Namespace }}
selector:
matchLabels:
app.kubernetes.io/name: {{ .Chart.Name }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while rolling updating, there are maybe two version(i.e. v1, v2) both existing. If you add "app.kubernetes.io/version" as a matchLabel, then app.kubernetes.io/version equals v2 and then v1's pods will not be monitered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound reasonable to me, i'd like to remove version as a match label.

Copy link
Contributor

@zhongqiangczq zhongqiangczq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

waitinfuture pushed a commit that referenced this pull request Aug 22, 2023
…8s namespace

### What changes were proposed in this pull request?
Use built-in helm var {{ .Release.Name }} to naming kubernetes resources, default with user input `celeborn`

close #1762
```
helm install celeborn charts/celeborn
// release name is celeborn
helm install celeborn-beta charts/celeborn
// release name is celeborn-beta
```

### Why are the changes needed?
Support multiple celeborn clusters in the same k8s namespace

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
GA and IT

Closes #1804 from zwangsheng/CELEBORN-886.

Authored-by: zwangsheng <[email protected]>
Signed-off-by: zky.zhoukeyong <[email protected]>
(cherry picked from commit 86d78d9)
Signed-off-by: zky.zhoukeyong <[email protected]>
@waitinfuture
Copy link
Contributor

Thanks! Merged to main/0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] support create multiple celeborn clusters(for flink) in one kubernetes namespace
4 participants