Quick Start Guide

For a more detailed guide on how to use, compose, and work with SparkApplications, please refer to the User Guide. If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the GCP guide. The Kubernetes Operator for Apache Spark will simply be referred to as the operator for the rest of this guide.

Installation

To install the operator, use the Helm chart.

$ helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
$ helm install incubator/sparkoperator

Installing the chart will create a namespace spark-operator, set up RBAC for the operator to run in the namespace. It will also set up RBAC for driver pods of your Spark applications to be able to manipulate executor pods. In addition, the chart will create a Deployment named sparkoperator in namespace spark-operator. The chart by default enables a Mutating Admission Webhook for Spark pod customization. A webhook service called spark-webhook and a secret storing the x509 certificate called spark-webhook-certs are created for that purpose. To install the operator without the mutating admission webhook on a Kubernetes cluster, install the chart with the flag enableWebhook=false:

$ helm install incubator/sparkoperator --set enableWebhook=false

Due to a known issue in GKE, you will need to first grant yourself cluster-admin privileges before you can create custom roles and role bindings on a GKE cluster versioned 1.6 and up. Run the following command before installing the chart on GKE:

$ kubectl create clusterrolebinding <user>-cluster-admin-binding --clusterrole=cluster-admin --user=<user>@<domain>

Now you should see the operator running in the cluster by checking the status of the Deployment.

$ kubectl describe deployment sparkoperator -n spark-operator

Metrics

The operator exposes a set of metrics via the metric endpoint to be scraped by Prometheus. The Helm chart by default installs the operator with the additional flag to enable metrics (-enable-metrics=true) as well as other annotations used by Prometheus to scrape the metric endpoint. To install the operator without metrics enabled, pass the appropriate flag during helm install:

$ helm install incubator/sparkoperator --set enableMetrics=false

If enabled, the operator generates the following metrics:

Metric	Description
`spark_app_submit_count`	Total number of SparkApplication submitted by the Operator.
`spark_app_success_count`	Total number of SparkApplication which completed successfully.
`spark_app_failure_count`	Total number of SparkApplication which failed to complete.
`spark_app_running_count`	Total number of SparkApplication which are currently running.
`spark_app_success_execution_time_microseconds`	Execution time for applications which succeeded.
`spark_app_failure_execution_time_microseconds`	Execution time for applications which failed.
`spark_app_executor_success_count`	Total number of Spark Executors which completed successfully.
`spark_app_executor_failure_count`	Total number of Spark Executors which failed.
`spark_app_executor_running_count`	Total number of Spark Executors which are currently running.

The following is a list of all the configurations the operators supports for metrics:

-enable-metrics=true
-metrics-port=10254
-metrics-endpoint=/metrics
-metrics-prefix=myServiceName 
-metrics-label=label1Key
-metrics-label=label2Key

All configs except -enable-metrics are optional. If port and/or endpoint are specified, please ensure that the annotations prometheus.io/port, prometheus.io/path and containerPort in spark-operator-with-metrics.yaml are updated as well.

A note about metrics-labels: In Prometheus, every unique combination of key-value label pair represents a new time series, which can dramatically increase the amount of data stored. Hence labels should not be used to store dimensions with high cardinality with potentially a large or unbounded value range.

Additionally, these metrics are best-effort for the current operator run and will be reset on an operator restart. Also some of these metrics are generated by listening to pod state updates for the driver/executors and deleting the pods outside the operator might lead to incorrect metric values for some of these metrics.

Configuration

The operator is typically deployed and run using the Helm chart. However, users can still run it outside a Kubernetes cluster and make it talk to the Kubernetes API server of a cluster by specifying path to kubeconfig, which can be done using the -kubeconfig flag.

The operator uses multiple workers in the SparkApplication controller and and the submission runner. The number of worker threads to use in the three places are controlled using command-line flags -controller-threads and -submission-threads, respectively. The default values for the flags are 10 and 3, respectively.

The operator enables cache resynchronization so periodically the informers used by the operator will re-list existing objects it manages and re-trigger resource events. The resynchronization interval in seconds can be configured using the flag -resync-interval, with a default value of 30 seconds.

By default, the operator will install the CustomResourceDefinitions for the custom resources it managers. This can be disabled by setting the flag -install-crds=false..

The mutating admission webhook is an optional component and can be enabled or disabled using the -enable-webhook flag, which defaults to false.

By default, the operator will manage custom resource objects of the managed CRD types for the whole cluster. It can be configured to manage only the custom resource objects in a specific namespace with the flag -namespace=<namespace>

Upgrade

To upgrade the the operator, e.g., to use a newer version container image with a new tag, run the following command with updated parameters for the Helm release:

$ helm upgrade <YOUR-HELM-RELEASE-NAME> --set operatorImageName=org/image --set operatorVersion=newTag

Refer to the Helm documentation for more details on helm upgrade.

Running the Examples

To run the Spark Pi example, run the following command:

$ kubectl apply -f examples/spark-pi.yaml

This will create a SparkApplication object named spark-pi. Check the object by running the following command:

$ kubectl get sparkapplications spark-pi -o=yaml

This will show something similar to the following:

apiVersion: sparkoperator.k8s.io/v1alpha1
kind: SparkApplication
metadata:
  ...
spec:
  deps: {}
  driver:
    coreLimit: 200m
    cores: 0.1
    labels:
      version: 2.3.0
    memory: 512m
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    labels:
      version: 2.3.0
    memory: 512m
  image: gcr.io/ynli-k8s/spark:v2.3.0
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
  mainClass: org.apache.spark.examples.SparkPi
  mode: cluster
  restartPolicy: Never
  type: Scala
status:
  appId: spark-pi-2402118027
  applicationState:
    state: COMPLETED
  completionTime: 2018-02-20T23:33:55Z
  driverInfo:
    podName: spark-pi-83ba921c85ff3f1cb04bef324f9154c9-driver
    webUIAddress: 35.192.234.248:31064
    webUIPort: 31064
    webUIServiceName: spark-pi-2402118027-ui-svc
  executorState:
    spark-pi-83ba921c85ff3f1cb04bef324f9154c9-exec-1: COMPLETED
  submissionTime: 2018-02-20T23:32:27Z

To check events for the SparkApplication object, run the following command:

$ kubectl describe sparkapplication spark-pi

This will show the events similarly to the following:

Events:
  Type    Reason                      Age   From            Message
  ----    ------                      ----  ----            -------
  Normal  SparkApplicationAdded       5m    spark-operator  SparkApplication spark-pi was added, enqueued it for submission
  Normal  SparkApplicationTerminated  4m    spark-operator  SparkApplication spark-pi terminated with state: COMPLETED

The operator submits the Spark Pi example to run once it receives an event indicating the SparkApplication object was added.

Using the Mutating Admission Webhook

The Kubernetes Operator for "Apache Spark comes with an optional mutating admission webhook for customizing Spark driver and executor pods based on the specification in SparkApplication objects, e.g., mounting user-specified ConfigMaps and volumes, and setting pod affinity/anti-affinity.

The webhook requires a X509 certificate for TLS for pod admission requests and responses between the Kubernetes API server and the webhook server running inside the operator. For that, the certificate and key files must be accessible by the webhook server. The Spark Operator ships with a tool at hack/gencerts.sh for generating the CA and server certificate and putting the certificate and key files into a secret named spark-webhook-certs in namespace sparkoperator. This secret will be mounted into the Spark Operator pod.

Run the following command to create secret with certificate and key files using Batch Job, and install the Spark Operator Deployment with the mutating admission webhook:

$ kubectl apply -f manifest/spark-operator-with-webhook.yaml

This will create a Deployment named sparkoperator and a Service named spark-webhook for the webhook in namespace sparkoperator.

If the operator is installed via the Helm chart using the default settings (i.e. with webhook enabled), the above steps are all automated for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quick-start-guide.md

quick-start-guide.md

Quick Start Guide

Table of Contents

Installation

Metrics

Configuration

Upgrade

Running the Examples

Using the Mutating Admission Webhook

Files

quick-start-guide.md

Latest commit

History

quick-start-guide.md

File metadata and controls

Quick Start Guide

Table of Contents

Installation

Metrics

Configuration

Upgrade

Running the Examples

Using the Mutating Admission Webhook