Skip to content

Latest commit

 

History

History
2614 lines (2614 loc) · 48.1 KB

api-docs.md

File metadata and controls

2614 lines (2614 loc) · 48.1 KB

Packages:

sparkoperator.k8s.io/v1beta2

Package v1beta2 is the v1beta2 version of the API.

Resource Types:

ScheduledSparkApplication

Field Description
apiVersion
string
sparkoperator.k8s.io/v1beta2
kind
string
ScheduledSparkApplication
metadata
Kubernetes meta/v1.ObjectMeta
Refer to the Kubernetes API documentation for the fields of the metadata field.
spec
ScheduledSparkApplicationSpec


schedule
string

Schedule is a cron schedule on which the application should run.

template
SparkApplicationSpec

Template is a template from which SparkApplication instances can be created.

suspend
bool
(Optional)

Suspend is a flag telling the controller to suspend subsequent runs of the application if set to true. Defaults to false.

concurrencyPolicy
ConcurrencyPolicy

ConcurrencyPolicy is the policy governing concurrent SparkApplication runs.

successfulRunHistoryLimit
int32
(Optional)

SuccessfulRunHistoryLimit is the number of past successful runs of the application to keep. Defaults to 1.

failedRunHistoryLimit
int32
(Optional)

FailedRunHistoryLimit is the number of past failed runs of the application to keep. Defaults to 1.

status
ScheduledSparkApplicationStatus

SparkApplication

SparkApplication represents a Spark application running on and using Kubernetes as a cluster manager.

Field Description
apiVersion
string
sparkoperator.k8s.io/v1beta2
kind
string
SparkApplication
metadata
Kubernetes meta/v1.ObjectMeta
Refer to the Kubernetes API documentation for the fields of the metadata field.
spec
SparkApplicationSpec


type
SparkApplicationType

Type tells the type of the Spark application.

sparkVersion
string

SparkVersion is the version of Spark the application uses.

mode
DeployMode

Mode is the deployment mode of the Spark application.

image
string
(Optional)

Image is the container image for the driver, executor, and init-container. Any custom container images for the driver, executor, or init-container takes precedence over this.

imagePullPolicy
string
(Optional)

ImagePullPolicy is the image pull policy for the driver, executor, and init-container.

imagePullSecrets
[]string
(Optional)

ImagePullSecrets is the list of image-pull secrets.

mainClass
string
(Optional)

MainClass is the fully-qualified main class of the Spark application. This only applies to Java/Scala Spark applications.

mainApplicationFile
string
(Optional)

MainFile is the path to a bundled JAR, Python, or R file of the application.

arguments
[]string
(Optional)

Arguments is a list of arguments to be passed to the application.

sparkConf
map[string]string
(Optional)

SparkConf carries user-specified Spark configuration properties as they would use the “–conf” option in spark-submit.

hadoopConf
map[string]string
(Optional)

HadoopConf carries user-specified Hadoop configuration properties as they would use the the “–conf” option in spark-submit. The SparkApplication controller automatically adds prefix “spark.hadoop.” to Hadoop configuration properties.

sparkConfigMap
string
(Optional)

SparkConfigMap carries the name of the ConfigMap containing Spark configuration files such as log4j.properties. The controller will add environment variable SPARK_CONF_DIR to the path where the ConfigMap is mounted to.

hadoopConfigMap
string
(Optional)

HadoopConfigMap carries the name of the ConfigMap containing Hadoop configuration files such as core-site.xml. The controller will add environment variable HADOOP_CONF_DIR to the path where the ConfigMap is mounted to.

volumes
[]Kubernetes core/v1.Volume
(Optional)

Volumes is the list of Kubernetes volumes that can be mounted by the driver and/or executors.

driver
DriverSpec

Driver is the driver specification.

executor
ExecutorSpec

Executor is the executor specification.

deps
Dependencies
(Optional)

Deps captures all possible types of dependencies of a Spark application.

restartPolicy
RestartPolicy

RestartPolicy defines the policy on if and in which conditions the controller should restart an application.

nodeSelector
map[string]string
(Optional)

NodeSelector is the Kubernetes node selector to be added to the driver and executor pods. This field is mutually exclusive with nodeSelector at podSpec level (driver or executor). This field will be deprecated in future versions (at SparkApplicationSpec level).

failureRetries
int32
(Optional)

FailureRetries is the number of times to retry a failed application before giving up. This is best effort and actual retry attempts can be >= the value specified.

retryInterval
int64
(Optional)

RetryInterval is the unit of intervals in seconds between submission retries.

pythonVersion
string
(Optional)

This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3, default 2.

memoryOverheadFactor
string
(Optional)

This sets the Memory Overhead Factor that will allocate memory to non-JVM memory. For JVM-based jobs this value will default to 0.10, for non-JVM jobs 0.40. Value of this field will be overridden by Spec.Driver.MemoryOverhead and Spec.Executor.MemoryOverhead if they are set.

monitoring
MonitoringSpec
(Optional)

Monitoring configures how monitoring is handled.

batchScheduler
string
(Optional)

BatchScheduler configures which batch scheduler will be used for scheduling

timeToLiveSeconds
int64
(Optional)

TimeToLiveSeconds defines the Time-To-Live (TTL) duration in seconds for this SparkAplication after its termination. The SparkApplication object will be garbage collected if the current time is more than the TimeToLiveSeconds since its termination.

batchSchedulerOptions
BatchSchedulerConfiguration
(Optional)

BatchSchedulerOptions provides fine-grained control on how to batch scheduling.

sparkUIOptions
SparkUIConfiguration
(Optional)

SparkUIOptions allows configuring the Service and the Ingress to expose the sparkUI

status
SparkApplicationStatus

ApplicationState

(Appears on: SparkApplicationStatus)

ApplicationState tells the current state of the application and an error message in case of failures.

Field Description
state
ApplicationStateType
errorMessage
string

ApplicationStateType (string alias)

(Appears on: ApplicationState)

ApplicationStateType represents the type of the current state of an application.

BatchSchedulerConfiguration

(Appears on: SparkApplicationSpec)

BatchSchedulerConfiguration used to configure how to batch scheduling Spark Application

Field Description
queue
string
(Optional)

Queue stands for the resource queue which the application belongs to, it’s being used in Volcano batch scheduler.

priorityClassName
string
(Optional)

PriorityClassName stands for the name of k8s PriorityClass resource, it’s being used in Volcano batch scheduler.

ConcurrencyPolicy (string alias)

(Appears on: ScheduledSparkApplicationSpec)

Dependencies

(Appears on: SparkApplicationSpec)

Dependencies specifies all possible types of dependencies of a Spark application.

Field Description
jars
[]string
(Optional)

Jars is a list of JAR files the Spark application depends on.

files
[]string
(Optional)

Files is a list of files the Spark application depends on.

pyFiles
[]string
(Optional)

PyFiles is a list of Python files the Spark application depends on.

DeployMode (string alias)

(Appears on: SparkApplicationSpec)

DeployMode describes the type of deployment of a Spark application.

DriverInfo

(Appears on: SparkApplicationStatus)

DriverInfo captures information about the driver.

Field Description
webUIServiceName
string
webUIPort
int32

UI Details for the UI created via ClusterIP service accessible from within the cluster.

webUIAddress
string
webUIIngressName
string

Ingress Details if an ingress for the UI was created.

webUIIngressAddress
string
podName
string

DriverSpec

(Appears on: SparkApplicationSpec)

DriverSpec is specification of the driver.

Field Description
SparkPodSpec
SparkPodSpec

(Members of SparkPodSpec are embedded into this type.)

podName
string
(Optional)

PodName is the name of the driver pod that the user creates. This is used for the in-cluster client mode in which the user creates a client pod where the driver of the user application runs. It’s an error to set this field if Mode is not in-cluster-client.

coreRequest
string
(Optional)

CoreRequest is the physical CPU core request for the driver. Maps to spark.kubernetes.driver.request.cores that is available since Spark 3.0.

javaOptions
string
(Optional)

JavaOptions is a string of extra JVM options to pass to the driver. For instance, GC settings or other logging.

lifecycle
Kubernetes core/v1.Lifecycle
(Optional)

Lifecycle for running preStop or postStart commands

DriverState (string alias)

DriverState tells the current state of a spark driver.

ExecutorSpec

(Appears on: SparkApplicationSpec)

ExecutorSpec is specification of the executor.

Field Description
SparkPodSpec
SparkPodSpec

(Members of SparkPodSpec are embedded into this type.)

instances
int32
(Optional)

Instances is the number of executor instances.

coreRequest
string
(Optional)

CoreRequest is the physical CPU core request for the executors. Maps to spark.kubernetes.executor.request.cores that is available since Spark 2.4.

javaOptions
string
(Optional)

JavaOptions is a string of extra JVM options to pass to the executors. For instance, GC settings or other logging.

deleteOnTermination
bool
(Optional)

DeleteOnTermination specify whether executor pods should be deleted in case of failure or normal termination. Maps to spark.kubernetes.executor.deleteOnTermination that is available since Spark 3.0.

ExecutorState (string alias)

(Appears on: SparkApplicationStatus)

ExecutorState tells the current state of an executor.

GPUSpec

(Appears on: SparkPodSpec)

Field Description
name
string

Name is GPU resource name, such as: nvidia.com/gpu or amd.com/gpu

quantity
int64

Quantity is the number of GPUs to request for driver or executor.

MonitoringSpec

(Appears on: SparkApplicationSpec)

MonitoringSpec defines the monitoring specification.

Field Description
exposeDriverMetrics
bool

ExposeDriverMetrics specifies whether to expose metrics on the driver.

exposeExecutorMetrics
bool

ExposeExecutorMetrics specifies whether to expose metrics on the executors.

metricsProperties
string
(Optional)

MetricsProperties is the content of a custom metrics.properties for configuring the Spark metric system. If not specified, the content in spark-docker/conf/metrics.properties will be used.

metricsPropertiesFile
string
(Optional)

MetricsPropertiesFile is the container local path of file metrics.properties for configuring the Spark metric system. If not specified, value /etc/metrics/conf/metrics.properties will be used.

prometheus
PrometheusSpec
(Optional)

Prometheus is for configuring the Prometheus JMX exporter.

NameKey

(Appears on: SparkPodSpec)

NameKey represents the name and key of a SecretKeyRef.

Field Description
name
string
key
string

NamePath

(Appears on: SparkPodSpec)

NamePath is a pair of a name and a path to which the named objects should be mounted to.

Field Description
name
string
path
string

PrometheusSpec

(Appears on: MonitoringSpec)

PrometheusSpec defines the Prometheus specification when Prometheus is to be used for collecting and exposing metrics.

Field Description
jmxExporterJar
string

JmxExporterJar is the path to the Prometheus JMX exporter jar in the container.

port
int32
(Optional)

Port is the port of the HTTP server run by the Prometheus JMX exporter. If not specified, 8090 will be used as the default.

configFile
string
(Optional)

ConfigFile is the path to the custom Prometheus configuration file provided in the Spark image. ConfigFile takes precedence over Configuration, which is shown below.

configuration
string
(Optional)

Configuration is the content of the Prometheus configuration needed by the Prometheus JMX exporter. If not specified, the content in spark-docker/conf/prometheus.yaml will be used. Configuration has no effect if ConfigFile is set.

RestartPolicy

(Appears on: SparkApplicationSpec)

RestartPolicy is the policy of if and in which conditions the controller should restart a terminated application. This completely defines actions to be taken on any kind of Failures during an application run.

Field Description
type
RestartPolicyType

Type specifies the RestartPolicyType.

onSubmissionFailureRetries
int32
(Optional)

OnSubmissionFailureRetries is the number of times to retry submitting an application before giving up. This is best effort and actual retry attempts can be >= the value specified due to caching. These are required if RestartPolicy is OnFailure.

onFailureRetries
int32
(Optional)

OnFailureRetries the number of times to retry running an application before giving up.

onSubmissionFailureRetryInterval
int64
(Optional)

OnSubmissionFailureRetryInterval is the interval in seconds between retries on failed submissions.

onFailureRetryInterval
int64
(Optional)

OnFailureRetryInterval is the interval in seconds between retries on failed runs.

RestartPolicyType (string alias)

(Appears on: RestartPolicy)

ScheduleState (string alias)

(Appears on: ScheduledSparkApplicationStatus)

ScheduledSparkApplicationSpec

(Appears on: ScheduledSparkApplication)

Field Description
schedule
string

Schedule is a cron schedule on which the application should run.

template
SparkApplicationSpec

Template is a template from which SparkApplication instances can be created.

suspend
bool
(Optional)

Suspend is a flag telling the controller to suspend subsequent runs of the application if set to true. Defaults to false.

concurrencyPolicy
ConcurrencyPolicy

ConcurrencyPolicy is the policy governing concurrent SparkApplication runs.

successfulRunHistoryLimit
int32
(Optional)

SuccessfulRunHistoryLimit is the number of past successful runs of the application to keep. Defaults to 1.

failedRunHistoryLimit
int32
(Optional)

FailedRunHistoryLimit is the number of past failed runs of the application to keep. Defaults to 1.

ScheduledSparkApplicationStatus

(Appears on: ScheduledSparkApplication)

Field Description
lastRun
Kubernetes meta/v1.Time

LastRun is the time when the last run of the application started.

nextRun
Kubernetes meta/v1.Time

NextRun is the time when the next run of the application will start.

lastRunName
string

LastRunName is the name of the SparkApplication for the most recent run of the application.

pastSuccessfulRunNames
[]string

PastSuccessfulRunNames keeps the names of SparkApplications for past successful runs.

pastFailedRunNames
[]string

PastFailedRunNames keeps the names of SparkApplications for past failed runs.

scheduleState
ScheduleState

ScheduleState is the current scheduling state of the application.

reason
string

Reason tells why the ScheduledSparkApplication is in the particular ScheduleState.

SecretInfo

(Appears on: SparkPodSpec)

SecretInfo captures information of a secret.

Field Description
name
string
path
string
secretType
SecretType

SecretType (string alias)

(Appears on: SecretInfo)

SecretType tells the type of a secret.

SparkApplicationSpec

(Appears on: SparkApplication, ScheduledSparkApplicationSpec)

SparkApplicationSpec describes the specification of a Spark application using Kubernetes as a cluster manager. It carries every pieces of information a spark-submit command takes and recognizes.

Field Description
type
SparkApplicationType

Type tells the type of the Spark application.

sparkVersion
string

SparkVersion is the version of Spark the application uses.

mode
DeployMode

Mode is the deployment mode of the Spark application.

image
string
(Optional)

Image is the container image for the driver, executor, and init-container. Any custom container images for the driver, executor, or init-container takes precedence over this.

imagePullPolicy
string
(Optional)

ImagePullPolicy is the image pull policy for the driver, executor, and init-container.

imagePullSecrets
[]string
(Optional)

ImagePullSecrets is the list of image-pull secrets.

mainClass
string
(Optional)

MainClass is the fully-qualified main class of the Spark application. This only applies to Java/Scala Spark applications.

mainApplicationFile
string
(Optional)

MainFile is the path to a bundled JAR, Python, or R file of the application.

arguments
[]string
(Optional)

Arguments is a list of arguments to be passed to the application.

sparkConf
map[string]string
(Optional)

SparkConf carries user-specified Spark configuration properties as they would use the “–conf” option in spark-submit.

hadoopConf
map[string]string
(Optional)

HadoopConf carries user-specified Hadoop configuration properties as they would use the the “–conf” option in spark-submit. The SparkApplication controller automatically adds prefix “spark.hadoop.” to Hadoop configuration properties.

sparkConfigMap
string
(Optional)

SparkConfigMap carries the name of the ConfigMap containing Spark configuration files such as log4j.properties. The controller will add environment variable SPARK_CONF_DIR to the path where the ConfigMap is mounted to.

hadoopConfigMap
string
(Optional)

HadoopConfigMap carries the name of the ConfigMap containing Hadoop configuration files such as core-site.xml. The controller will add environment variable HADOOP_CONF_DIR to the path where the ConfigMap is mounted to.

volumes
[]Kubernetes core/v1.Volume
(Optional)

Volumes is the list of Kubernetes volumes that can be mounted by the driver and/or executors.

driver
DriverSpec

Driver is the driver specification.

executor
ExecutorSpec

Executor is the executor specification.

deps
Dependencies
(Optional)

Deps captures all possible types of dependencies of a Spark application.

restartPolicy
RestartPolicy

RestartPolicy defines the policy on if and in which conditions the controller should restart an application.

nodeSelector
map[string]string
(Optional)

NodeSelector is the Kubernetes node selector to be added to the driver and executor pods. This field is mutually exclusive with nodeSelector at podSpec level (driver or executor). This field will be deprecated in future versions (at SparkApplicationSpec level).

failureRetries
int32
(Optional)

FailureRetries is the number of times to retry a failed application before giving up. This is best effort and actual retry attempts can be >= the value specified.

retryInterval
int64
(Optional)

RetryInterval is the unit of intervals in seconds between submission retries.

pythonVersion
string
(Optional)

This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3, default 2.

memoryOverheadFactor
string
(Optional)

This sets the Memory Overhead Factor that will allocate memory to non-JVM memory. For JVM-based jobs this value will default to 0.10, for non-JVM jobs 0.40. Value of this field will be overridden by Spec.Driver.MemoryOverhead and Spec.Executor.MemoryOverhead if they are set.

monitoring
MonitoringSpec
(Optional)

Monitoring configures how monitoring is handled.

batchScheduler
string
(Optional)

BatchScheduler configures which batch scheduler will be used for scheduling

timeToLiveSeconds
int64
(Optional)

TimeToLiveSeconds defines the Time-To-Live (TTL) duration in seconds for this SparkAplication after its termination. The SparkApplication object will be garbage collected if the current time is more than the TimeToLiveSeconds since its termination.

batchSchedulerOptions
BatchSchedulerConfiguration
(Optional)

BatchSchedulerOptions provides fine-grained control on how to batch scheduling.

sparkUIOptions
SparkUIConfiguration
(Optional)

SparkUIOptions allows configuring the Service and the Ingress to expose the sparkUI

SparkApplicationStatus

(Appears on: SparkApplication)

SparkApplicationStatus describes the current status of a Spark application.

Field Description
sparkApplicationId
string

SparkApplicationID is set by the spark-distribution(via spark.app.id config) on the driver and executor pods

submissionID
string

SubmissionID is a unique ID of the current submission of the application.

lastSubmissionAttemptTime
Kubernetes meta/v1.Time

LastSubmissionAttemptTime is the time for the last application submission attempt.

terminationTime
Kubernetes meta/v1.Time

CompletionTime is the time when the application runs to completion if it does.

driverInfo
DriverInfo

DriverInfo has information about the driver.

applicationState
ApplicationState

AppState tells the overall application state.

executorState
map[string]github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/apis/sparkoperator.k8s.io/v1beta2.ExecutorState

ExecutorState records the state of executors by executor Pod names.

executionAttempts
int32

ExecutionAttempts is the total number of attempts to run a submitted application to completion. Incremented upon each attempted run of the application and reset upon invalidation.

submissionAttempts
int32

SubmissionAttempts is the total number of attempts to submit an application to run. Incremented upon each attempted submission of the application and reset upon invalidation and rerun.

SparkApplicationType (string alias)

(Appears on: SparkApplicationSpec)

SparkApplicationType describes the type of a Spark application.

SparkPodSpec

(Appears on: DriverSpec, ExecutorSpec)

SparkPodSpec defines common things that can be customized for a Spark driver or executor pod. TODO: investigate if we should use v1.PodSpec and limit what can be set instead.

Field Description
cores
int32
(Optional)

Cores maps to spark.driver.cores or spark.executor.cores for the driver and executors, respectively.

coreLimit
string

CoreLimit specifies a hard limit on CPU cores for the pod. Optional

memory
string
(Optional)

Memory is the amount of memory to request for the pod.

memoryOverhead
string
(Optional)

MemoryOverhead is the amount of off-heap memory to allocate in cluster mode, in MiB unless otherwise specified.

gpu
GPUSpec
(Optional)

GPU specifies GPU requirement for the pod.

image
string
(Optional)

Image is the container image to use. Overrides Spec.Image if set.

configMaps
[]NamePath
(Optional)

ConfigMaps carries information of other ConfigMaps to add to the pod.

secrets
[]SecretInfo
(Optional)

Secrets carries information of secrets to add to the pod.

env
[]Kubernetes core/v1.EnvVar
(Optional)

Env carries the environment variables to add to the pod.

envVars
map[string]string
(Optional)

EnvVars carries the environment variables to add to the pod. Deprecated. Consider using env instead.

envFrom
[]Kubernetes core/v1.EnvFromSource
(Optional)

EnvFrom is a list of sources to populate environment variables in the container.

envSecretKeyRefs
map[string]github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/apis/sparkoperator.k8s.io/v1beta2.NameKey
(Optional)

EnvSecretKeyRefs holds a mapping from environment variable names to SecretKeyRefs. Deprecated. Consider using env instead.

labels
map[string]string
(Optional)

Labels are the Kubernetes labels to be added to the pod.

annotations
map[string]string
(Optional)

Annotations are the Kubernetes annotations to be added to the pod.

volumeMounts
[]Kubernetes core/v1.VolumeMount
(Optional)

VolumeMounts specifies the volumes listed in “.spec.volumes” to mount into the main container’s filesystem.

affinity
Kubernetes core/v1.Affinity
(Optional)

Affinity specifies the affinity/anti-affinity settings for the pod.

tolerations
[]Kubernetes core/v1.Toleration
(Optional)

Tolerations specifies the tolerations listed in “.spec.tolerations” to be applied to the pod.

securityContext
Kubernetes core/v1.PodSecurityContext
(Optional)

SecurityContenxt specifies the PodSecurityContext to apply.

schedulerName
string
(Optional)

SchedulerName specifies the scheduler that will be used for scheduling

sidecars
[]Kubernetes core/v1.Container
(Optional)

Sidecars is a list of sidecar containers that run along side the main Spark container.

initContainers
[]Kubernetes core/v1.Container
(Optional)

InitContainers is a list of init-containers that run to completion before the main Spark container.

hostNetwork
bool
(Optional)

HostNetwork indicates whether to request host networking for the pod or not.

nodeSelector
map[string]string
(Optional)

NodeSelector is the Kubernetes node selector to be added to the driver and executor pods. This field is mutually exclusive with nodeSelector at SparkApplication level (which will be deprecated).

dnsConfig
Kubernetes core/v1.PodDNSConfig
(Optional)

DnsConfig dns settings for the pod, following the Kubernetes specifications.

terminationGracePeriodSeconds
int64
(Optional)

Termination grace periond seconds for the pod

serviceAccount
string
(Optional)

ServiceAccount is the name of the custom Kubernetes service account used by the pod.

SparkUIConfiguration

(Appears on: SparkApplicationSpec)

Specific SparkUI config parameters

Field Description
servicePort
int32
(Optional)

ServicePort allows configuring the port at service level that might be different from the targetPort. TargetPort should be the same as the one defined in spark.ui.port

ingressAnnotations
map[string]string
(Optional)

IngressAnnotations is a map of key,value pairs of annotations that might be added to the ingress object. i.e. specify nginx as ingress.class

ingressTLS
[]Kubernetes extensions/v1beta1.IngressTLS
(Optional)

TlsHosts is useful If we need to declare SSL certificates to the ingress object


Generated with gen-crd-api-reference-docs on git commit 43fb5e7.