Flyte can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.
To install the plugin, run the following command:
pip install flytekitplugins-spark
To configure Spark in the Flyte deployment's backend, follow Step 1, 2, and 3.
All examples showcasing execution of Spark jobs using the plugin can be found in the documentation.