How to Deploy Spark with Kubernetes on Bright 9.1 with CentOS8

Contents

The steps described in this page can be followed to run a distributed Spark application using Kubernetes on Bright 9.1.

1. Software versions

The Docker image that is going to be used for Spark will provide software with the following main versions:

Operating System: Debian GNU/Linux 10
Apache Spark: 3.1.1
Open JDK: 8

2. Prerequisites

The steps described in this article have been tested with this environment:

Kubernetes: 1.18.15 (default values for cm-kubernetes-setup are sufficient)
Docker version: 19.03.15 (provided by cm-kubernetes-setup)
Docker registry (default values for cm-docker-registry-setup are sufficient)

3. Locally install Spark

In order to run Spark applications, the spark-submit binary is required.

The binary can be download as follows:

# sudo yum install -y git java-1.8.0-openjdk-devel
# wget https://apache.mirror.wearetriple.com/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
# tar -zxvf spark-3.1.1-bin-hadoop3.2.tgz
# cd spark-3.1.1-bin-hadoop3.2/
# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
# export PATH=$PATH:$JAVA_HOME/bin

4. Image creation

In a cluster with a Docker registry having domain head-node-name and port number 5000, run:

# module load docker
# ./bin/docker-image-tool.sh -r head-node-name:5000/brightcomputing -t v3.1.1 ./kubernetes/dockerfiles/spark/Dockerfile build
# ./bin/docker-image-tool.sh -r head-node-name:5000/brightcomputing -t v3.1.1 -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
# docker push head-node-name:5000/brightcomputing/spark:v3.1.1 
# docker push head-node-name:5000/brightcomputing/spark-py:v3.1.1

It should now be possible to pull the image just created:

# docker pull head-node-name:5000/spark:v3.1.1
# docker pull head-node-name:5000/spark-py:v3.1.1

5. Configure Kubernetes for Spark

# module load kubernetes
# cat << EOF > spark.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: default
secrets:
- name: spark-token-gmdfz
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit
subjects:
- kind: ServiceAccount
  name: spark
  namespace: default
EOF

# kubectl apply -f spark.yaml

6. Run Spark with Kubernetes

Kubernetes can now be used to start a Spark cluster.

The following example will compute the first digits of π. The Spark driver will be started on a pod running on the head node and listening on port 10433.In addition, 3 Spark executors will be started by Kubernetes.

# module load kubernetes
# module load spark
# spark-submit \
    --master k8s://https://localhost:10443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=head-node-name:5000/brightcomputing/spark-py:v3.1.1 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar

Kubernetes will now schedule Spark pods. Their status will switch from Pending to Running to Succeeded. The final output in the terminal should be similar to the following:

21/06/02 14:48:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 21/06/02 14:48:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
 21/06/02 14:48:10 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
 21/06/02 14:48:13 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Pending
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: waiting
          pending reason: ContainerCreating
 21/06/02 14:48:13 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Pending
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: waiting
          pending reason: ContainerCreating
 21/06/02 14:48:13 INFO LoggingPodStatusWatcherImpl: Waiting for application spark-pi with submission ID default:spark-pi-0895fe79ccc45249-driver to finish…
 21/06/02 14:48:14 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Pending)
 21/06/02 14:48:14 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Pending
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: waiting
          pending reason: ContainerCreating
 21/06/02 14:48:15 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Pending)
 21/06/02 14:48:36 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Pending)
 21/06/02 14:48:36 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Running
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: running
          container started at: 2021-06-02T12:48:36Z
 21/06/02 14:48:37 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Running)
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Succeeded
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: terminated
          container started at: 2021-06-02T12:48:36Z
          container finished at: 2021-06-02T12:49:21Z
          exit code: 0
          termination reason: Completed
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Succeeded)
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: Container final statuses:
  container name: spark-kubernetes-driver  container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1  container state: terminated  container started at: 2021-06-02T12:48:36Z  container finished at: 2021-06-02T12:49:21Z  exit code: 0  termination reason: Completed
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: Application spark-pi with submission ID default:spark-pi-0895fe79ccc45249-driver finished
 21/06/02 14:49:22 INFO ShutdownHookManager: Shutdown hook called
 21/06/02 14:49:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-8ade5ae9-a24a-4da2-950f-3359b510b7cb

By monitoring pods with kubectl, it will be possible to notice that 4 pods have been started: 1 driver and 3 executors:

# kubectl get pods
 NAME                            READY   STATUS              RESTARTS   AGE
 spark-pi-1622477525687-driver   1/1     Running             0          16s
 spark-pi-1622477525687-exec-1   0/1     ContainerCreating   0          9s
 spark-pi-1622477525687-exec-2   0/1     Pending             0          9s
 spark-pi-1622477525687-exec-3   0/1     Pending             0          8s

At the end of the run, only the Spark driver pod is available, with Completed status:

# kubectl get pods
 NAME                            READY   STATUS      RESTARTS   AGE
 spark-pi-1622477525687-driver   0/1     Completed   0          75s

It is possible to read the computed value of π in the Spark execution log by inspecting the log of the driver pod:

# kubectl logs spark-pi-1622477525687-driver
++ id -u
 myuid=0
 ++ id -g
 mygid=0
 set +e
 ++ getent passwd 0
 uidentry=root:x:0:0:root:/root:/bin/bash
 set -e
 '[' -z root:x:0:0:root:/root:/bin/bash ']'
 SPARK_K8S_CMD=driver
 case "$SPARK_K8S_CMD" in
 shift 1
 SPARK_CLASSPATH=':/opt/spark/jars/*'
 sed 's/[^=]=(.)/\1/g'
 grep SPARK_JAVA_OPT_
 env
 sort -t_ -k4 -n
 readarray -t SPARK_EXECUTOR_JAVA_OPTS
 '[' -n '' ']'
 '[' -n '' ']'
 PYSPARK_ARGS=
 '[' -n '' ']'
 R_ARGS=
 '[' -n '' ']'
 '[' '' == 2 ']'
 '[' '' == 3 ']'
 case "$SPARK_K8S_CMD" in
 CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
 exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.29.152.138 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal
 21/05/31 16:12:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 21/05/31 16:12:12 INFO SparkContext: Running Spark version 2.4.4
 21/05/31 16:12:12 INFO SparkContext: Submitted application: Spark Pi
 21/05/31 16:12:12 INFO SecurityManager: Changing view acls to: root
 21/05/31 16:12:12 INFO SecurityManager: Changing modify acls to: root
 21/05/31 16:12:12 INFO SecurityManager: Changing view acls groups to: 
 21/05/31 16:12:12 INFO SecurityManager: Changing modify acls groups to: 
 21/05/31 16:12:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
 21/05/31 16:12:12 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
 21/05/31 16:12:12 INFO SparkEnv: Registering MapOutputTracker
 21/05/31 16:12:12 INFO SparkEnv: Registering BlockManagerMaster
 21/05/31 16:12:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
 21/05/31 16:12:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
 21/05/31 16:12:12 INFO DiskBlockManager: Created local directory at /var/data/spark-b18be789-d365-41c3-b750-28fbc161db77/blockmgr-62e61fec-dc30-4df0-b01a-803e802d27c1
 21/05/31 16:12:12 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
 21/05/31 16:12:12 INFO SparkEnv: Registering OutputCommitCoordinator
 21/05/31 16:12:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
 21/05/31 16:12:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-1622477525687-driver-svc.default.svc:4040
 21/05/31 16:12:13 INFO SparkContext: Added JAR file:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar at spark://spark-pi-1622477525687-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.4.4.jar with timestamp 1622477533267
 21/05/31 16:12:14 INFO ExecutorPodsAllocator: Going to request 3 executors from Kubernetes.
 21/05/31 16:12:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
 21/05/31 16:12:14 INFO NettyBlockTransferService: Server created on spark-pi-1622477525687-driver-svc.default.svc:7079
 21/05/31 16:12:14 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
 21/05/31 16:12:14 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:14 INFO BlockManagerMasterEndpoint: Registering block manager spark-pi-1622477525687-driver-svc.default.svc:7079 with 413.9 MB RAM, BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:14 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:14 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:27 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.141.0.1:35422) with ID 1
 21/05/31 16:12:28 INFO BlockManagerMasterEndpoint: Registering block manager 172.29.112.144:38665 with 413.9 MB RAM, BlockManagerId(1, 172.29.112.144, 38665, None)
 21/05/31 16:12:44 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
 21/05/31 16:12:44 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
 21/05/31 16:12:45 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
 21/05/31 16:12:45 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
 21/05/31 16:12:45 INFO DAGScheduler: Parents of final stage: List()
 21/05/31 16:12:45 INFO DAGScheduler: Missing parents: List()
 21/05/31 16:12:45 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
 21/05/31 16:12:45 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
 21/05/31 16:12:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
 21/05/31 16:12:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-pi-1622477525687-driver-svc.default.svc:7079 (size: 1256.0 B, free: 413.9 MB)
 21/05/31 16:12:45 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
 21/05/31 16:12:45 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
 21/05/31 16:12:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 21/05/31 16:12:45 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.29.112.144, executor 1, partition 0, PROCESS_LOCAL, 7885 bytes)
 21/05/31 16:12:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.29.112.144:38665 (size: 1256.0 B, free: 413.9 MB)
 21/05/31 16:12:46 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.29.112.144, executor 1, partition 1, PROCESS_LOCAL, 7885 bytes)
 21/05/31 16:12:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 719 ms on 172.29.112.144 (executor 1) (1/2)
 21/05/31 16:12:46 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 47 ms on 172.29.112.144 (executor 1) (2/2)
 21/05/31 16:12:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
 21/05/31 16:12:46 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.972 s
 21/05/31 16:12:46 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.107471 s
 Pi is roughly 3.1383356916784586
 21/05/31 16:12:46 INFO SparkUI: Stopped Spark web UI at http://spark-pi-1622477525687-driver-svc.default.svc:4040
 21/05/31 16:12:46 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
 21/05/31 16:12:46 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
 21/05/31 16:12:46 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
 21/05/31 16:12:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
 21/05/31 16:12:46 INFO MemoryStore: MemoryStore cleared
 21/05/31 16:12:46 INFO BlockManager: BlockManager stopped
 21/05/31 16:12:46 INFO BlockManagerMaster: BlockManagerMaster stopped
 21/05/31 16:12:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
 21/05/31 16:12:46 INFO SparkContext: Successfully stopped SparkContext
 21/05/31 16:12:46 INFO ShutdownHookManager: Shutdown hook called
 21/05/31 16:12:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-c60b2d8f-6be2-4c6c-bafa-e7af352b6715
 21/05/31 16:12:46 INFO ShutdownHookManager: Deleting directory /var/data/spark-b18be789-d365-41c3-b750-28fbc161db77/spark-137a193e-0278-4ab1-8bd4-cb448e8aa47f

Updated on June 2, 2021