CRD reference

Below are listed the CRD fields that can be defined by the user:

CRD field Remarks

apiVersion

spark.stackable.tech/v1alpha1

kind

SparkApplication

metadata.name

Application name. Mandatory.

spec.mode

cluster or client. Currently only cluster is supported. Mandatory.

spec.image

User-supplied image containing spark-job dependencies that will be copied to the specified volume mount.

spec.sparkImage

Spark image which will be deployed to driver and executor pods, which must contain spark environment needed by the job e.g. docker.stackable.tech/stackable/spark-k8s:3.5.0-stackable0.0.0-dev. Mandatory.

spec.sparkImagePullPolicy

Optional Enum (one of Always, IfNotPresent or Never) that determines the pull policy of the spark job image.

spec.sparkImagePullSecrets

An optional list of references to secrets in the same namespace to use for pulling any of the images used by a SparkApplication resource. Each reference has a single property (name) that must contain a reference to a valid secret.

spec.mainApplicationFile

The actual application file that will be called by spark-submit. Mandatory.

spec.mainClass

The main class/entry point for JVM artifacts.

spec.args

Arguments passed directly to the job artifact.

spec.s3connection

S3 connection specification. See the S3 resources for more details.

spec.sparkConf

A map of key/value strings that will be passed directly to `spark-submit.

spec.deps.requirements

A list of python packages that will be installed via pip.

spec.deps.packages

A list of packages that is passed directly to spark-submit.

spec.deps.excludePackages

A list of excluded packages that is passed directly to spark-submit.

spec.deps.repositories

A list of repositories that is passed directly to spark-submit

spec.volumes

A list of volumes

spec.volumes.name

The volume name

spec.volumes.persistentVolumeClaim.claimName

The persistent volume claim backing the volume

spec.job.resources

Resources specification for the initiating Job

spec.driver.resources

Resources specification for the driver Pod

spec.driver.volumeMounts

A list of mounted volumes for the driver

spec.driver.volumeMounts.name

Name of mount

spec.driver.volumeMounts.mountPath

Volume mount path

spec.driver.affinity

Driver Pod placement affinity. See Pod Placement for details

spec.driver.logging

Logging aggregation for the driver Pod. See Logging for details

spec.executor.resources

Resources specification for the executor Pods

spec.executor.replicas

Number of executor instances launched for this job.

spec.executor.volumeMounts

A list of mounted volumes for each executor.

spec.executor.volumeMounts.name

Name of mount.

spec.executor.volumeMounts.mountPath

Volume mount path.

spec.executor.affinity

Driver Pod placement affinity. See Pod Placement for details.

spec.executor.logging

Logging aggregation for the executor Pods. See Logging for details.

spec.logFileDirectory.bucket

S3 bucket definition where applications should publish events for the Spark History server.

spec.logFileDirectory.prefix

Prefix to use when storing events for the Spark History server.

spec.driver.jvmSecurity

A list JVM security properties to pass on to the driver VM. The TTL of DNS caches are especially important.

spec.executor.jvmSecurity

A list JVM security properties to pass on to the executor VM. The TTL of DNS caches are especially important.