Skip to content

Latest commit

 

History

History
98 lines (74 loc) · 5.75 KB

config.md

File metadata and controls

98 lines (74 loc) · 5.75 KB

Spark Configuration

The exact explanation and defaults for spark config can be found here, None means to use the spark native defaults

Config PySpark Session via environment variables

Generated by generate-config-docs.py Run python ./generate_config_docs.py to update this file

Source code: sparglim/config/configer.py

Avaliable environment variables for SparkEnvConfiger:

Default config:

  • SPAGLIM_APP_NAME: spark.app.name, default: Sparglim.
  • SPAGLIM_DEPLOY_MODE: spark.submit.deployMode, default: client.
  • SPARGLIM_SCHEDULER_MODE: spark.scheduler.mode, default: FAIR.
  • SPARGLIM_UI_PORT: spark.ui.port, default: None.
  • SPARGLIM_DRIVER_JAVA_OPTIONS: spark.driver.defaultJavaOptions, default: None.
  • SPARGLIM_EXECUTOR_JAVA_OPTIONS: spark.executor.defaultJavaOptions, default: None.
  • SPARGLIM_DRIVER_JAVA_EXTRA_OPTIONS: spark.driver.extraJavaOptions, default: None.
  • SPARGLIM_EXECUTOR_JAVA_EXTRA_OPTIONS: spark.executor.extraJavaOptions, default: None.
  • S3_ACCESS_KEY or AWS_ACCESS_KEY_ID: spark.hadoop.fs.s3a.access.key, default: None.
  • S3_SECRET_KEY or AWS_SECRET_ACCESS_KEY: spark.hadoop.fs.s3a.secret.key, default: None.
  • S3_ENTRY_POINT: spark.hadoop.fs.s3a.endpoint, default: None.
  • S3_ENTRY_POINT_REGION or AWS_DEFAULT_REGION: spark.hadoop.fs.s3a.endpoint.region, default: None.
  • S3_PATH_STYLE_ACCESS: spark.hadoop.fs.s3a.path.style.access, default: None.
  • S3_MAGIC_COMMITTER: spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled, default: None.
  • SPARGIM_KERBEROS_KEYTAB: spark.kerberos.keytab, default: None.
  • SPARGIM_KERBEROS_PRINCIPAL: spark.kerberos.principal, default: None.

config_basic() can config following:

  • SPAGLIM_APP_NAME: spark.app.name, default: Sparglim.
  • SPAGLIM_DEPLOY_MODE: spark.submit.deployMode, default: client.
  • SPARGLIM_SCHEDULER_MODE: spark.scheduler.mode, default: FAIR.
  • SPARGLIM_UI_PORT: spark.ui.port, default: None.
  • SPARGLIM_DRIVER_JAVA_OPTIONS: spark.driver.defaultJavaOptions, default: None.
  • SPARGLIM_EXECUTOR_JAVA_OPTIONS: spark.executor.defaultJavaOptions, default: None.
  • SPARGLIM_DRIVER_JAVA_EXTRA_OPTIONS: spark.driver.extraJavaOptions, default: None.
  • SPARGLIM_EXECUTOR_JAVA_EXTRA_OPTIONS: spark.executor.extraJavaOptions, default: None.

config_s3() can config following:

  • S3_ACCESS_KEY or AWS_ACCESS_KEY_ID: spark.hadoop.fs.s3a.access.key, default: None.
  • S3_SECRET_KEY or AWS_SECRET_ACCESS_KEY: spark.hadoop.fs.s3a.secret.key, default: None.
  • S3_ENTRY_POINT: spark.hadoop.fs.s3a.endpoint, default: None.
  • S3_ENTRY_POINT_REGION or AWS_DEFAULT_REGION: spark.hadoop.fs.s3a.endpoint.region, default: None.
  • S3_PATH_STYLE_ACCESS: spark.hadoop.fs.s3a.path.style.access, default: None.
  • S3_MAGIC_COMMITTER: spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled, default: None.

config_kerberos() can config following:

  • SPARGIM_KERBEROS_KEYTAB: spark.kerberos.keytab, default: None.
  • SPARGIM_KERBEROS_PRINCIPAL: spark.kerberos.principal, default: None.

config_local() can config following:

  • SPARGLIM_MASTER: spark.master, default: local[*].
  • SPARGLIM_LOCAL_MEMORY: spark.driver.memory, default: 512m.

config_connect_client() can config following:

  • SPARGLIM_REMOTE: spark.remote, default: sc://localhost:15002.

config_connect_server() can config following:

  • SPARGLIM_CONNECT_SERVER_PORT: spark.connect.grpc.binding.port, default: None.
  • SPARGLIM_CONNECT_GRPC_ARROW_MAXBS: spark.connect.grpc.arrow.maxBatchSize, default: None.
  • SPARGLIM_CONNECT_GRPC_MAXIM: spark.connect.grpc.maxInboundMessageSize, default: None.

config_k8s() can config following:

  • SPARGLIM_MASTER: spark.master, default: k8s://https://kubernetes.default.svc.
  • SPARGLIM_K8S_NAMESPACE: spark.kubernetes.namespace, default: None.
  • SPARGLIM_K8S_IMAGE: spark.kubernetes.container.image, default: wh1isper/spark-executor:3.4.1.
  • SPARGLIM_K8S_IMAGE_PULL_SECRETS: spark.kubernetes.container.image.pullSecrets, default: None.
  • SPARGLIM_K8S_IMAGE_PULL_POLICY: spark.kubernetes.container.image.pullPolicy, default: IfNotPresent.
  • SPARK_EXECUTOR_NUMS: spark.executor.instances, default: 3.
  • SPARGLIM_K8S_EXECUTOR_LABEL_LIST: spark.kubernetes.executor.label.*, default: sparglim-executor. A string seperated by , will be converted
  • SPARGLIM_K8S_EXECUTOR_ANNOTATION_LIST: spark.kubernetes.executor.annotation.*, default: sparglim-executor. A string seperated by , will be converted
  • SPARGLIM_DRIVER_HOST: spark.driver.host, default: None.
  • SPARGLIM_DRIVER_BINDADDRESS: spark.driver.bindAddress, default: 0.0.0.0.
  • SPARGLIM_DRIVER_POD_NAME: spark.kubernetes.driver.pod.name, default: None.
  • SPARGLIM_K8S_EXECUTOR_REQUEST_CORES: spark.kubernetes.executor.cores, default: None.
  • SPARGLIM_K8S_EXECUTOR_LIMIT_CORES: spark.kubernetes.executor.limit.cores, default: None.
  • SPARGLIM_EXECUTOR_REQUEST_MEMORY: spark.executor.memory, default: 512m.
  • SPARGLIM_EXECUTOR_LIMIT_MEMORY: spark.executor.memoryOverhead, default: None.
  • SPARGLIM_K8S_GPU_VENDOR: spark.executor.resource.gpu.vendor, default: nvidia.com.
  • SPARGLIM_K8S_GPU_DISCOVERY_SCRIPT: spark.executor.resource.gpu.discoveryScript, default: /opt/spark/examples/src/main/scripts/getGpusResources.sh.
  • SPARGLIM_K8S_GPU_AMOUNT: spark.executor.resource.gpu.amount, default: None.
  • SPARGLIM_RAPIDS_SQL_ENABLED: spark.rapids.sql.enabled, default: None.

TIPS

S3 secrets tokens(and others) need only be configured on the Driver or Connect Server, Configuration in Connect client take no effort.