1. Home
  2. How to Deploy Spark with Kubernetes on Bright 9.1 with CentOS8

How to Deploy Spark with Kubernetes on Bright 9.1 with CentOS8

The steps described in this page can be followed to run a distributed Spark application using Kubernetes on Bright 9.1.

1. Software versions

The Docker image that is going to be used for Spark will provide software with the following main versions:

  • Operating System: Debian GNU/Linux 10
  • Apache Spark: 3.1.1
  • Open JDK: 8
2. Prerequisites

The steps described in this article have been tested with this environment:

  • Kubernetes: 1.18.15 (default values for cm-kubernetes-setup are sufficient)
  • Docker version: 19.03.15 (provided by cm-kubernetes-setup)
  • Docker registry (default values for cm-docker-registry-setup are sufficient)
3. Locally install Spark

In order to run Spark applications, the spark-submit binary is required.

The binary can be download as follows:

# sudo yum install -y git java-1.8.0-openjdk-devel
# wget https://apache.mirror.wearetriple.com/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
# tar -zxvf spark-3.1.1-bin-hadoop3.2.tgz
# cd spark-3.1.1-bin-hadoop3.2/
# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
# export PATH=$PATH:$JAVA_HOME/bin
4. Image creation

In a cluster with a Docker registry having domain head-node-name and port number 5000, run:

# module load docker
# ./bin/docker-image-tool.sh -r head-node-name:5000/brightcomputing -t v3.1.1 ./kubernetes/dockerfiles/spark/Dockerfile build
# ./bin/docker-image-tool.sh -r head-node-name:5000/brightcomputing -t v3.1.1 -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
# docker push head-node-name:5000/brightcomputing/spark:v3.1.1 
# docker push head-node-name:5000/brightcomputing/spark-py:v3.1.1

It should now be possible to pull the image just created:

# docker pull head-node-name:5000/spark:v3.1.1
# docker pull head-node-name:5000/spark-py:v3.1.1
5. Configure Kubernetes for Spark
# module load kubernetes
# cat << EOF > spark.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: default
secrets:
- name: spark-token-gmdfz
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: spark-role
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit
subjects:
- kind: ServiceAccount
  name: spark
  namespace: default
EOF

# kubectl apply -f spark.yaml
6. Run Spark with Kubernetes

Kubernetes can now be used to start a Spark cluster.

The following example will compute the first digits of π. The Spark driver will be started on a pod running on the head node and listening on port 10433.In addition, 3 Spark executors will be started by Kubernetes.

# module load kubernetes
# module load spark
# spark-submit \
    --master k8s://https://localhost:10443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=3 \
    --conf spark.kubernetes.container.image=head-node-name:5000/brightcomputing/spark-py:v3.1.1 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar

Kubernetes will now schedule Spark pods. Their status will switch from Pending to Running to Succeeded. The final output in the terminal should be similar to the following:

21/06/02 14:48:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 21/06/02 14:48:08 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
 21/06/02 14:48:10 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
 21/06/02 14:48:13 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Pending
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: waiting
          pending reason: ContainerCreating
 21/06/02 14:48:13 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Pending
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: waiting
          pending reason: ContainerCreating
 21/06/02 14:48:13 INFO LoggingPodStatusWatcherImpl: Waiting for application spark-pi with submission ID default:spark-pi-0895fe79ccc45249-driver to finish…
 21/06/02 14:48:14 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Pending)
 21/06/02 14:48:14 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Pending
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: waiting
          pending reason: ContainerCreating
 21/06/02 14:48:15 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Pending)
 21/06/02 14:48:36 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Pending)
 21/06/02 14:48:36 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Running
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: running
          container started at: 2021-06-02T12:48:36Z
 21/06/02 14:48:37 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Running)
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
      pod name: spark-pi-0895fe79ccc45249-driver
      namespace: default
      labels: spark-app-selector -> spark-4af7eeb956a74017bf3a7f05cbd74fb7, spark-role -> driver
      pod uid: 3059061d-5132-42a4-9b51-dee8e613b2f9
      creation time: 2021-06-02T12:48:12Z
      service account name: spark
      volumes: spark-local-dir-1, spark-conf-volume-driver, spark-token-rvz28
      node name: node003
      start time: 2021-06-02T12:48:12Z
      phase: Succeeded
      container status: 
          container name: spark-kubernetes-driver
          container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1
          container state: terminated
          container started at: 2021-06-02T12:48:36Z
          container finished at: 2021-06-02T12:49:21Z
          exit code: 0
          termination reason: Completed
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: Application status for spark-4af7eeb956a74017bf3a7f05cbd74fb7 (phase: Succeeded)
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: Container final statuses:
  container name: spark-kubernetes-driver  container image: gt-kb-check-spark:5000/brightcomputing/spark-py:v3.1.1  container state: terminated  container started at: 2021-06-02T12:48:36Z  container finished at: 2021-06-02T12:49:21Z  exit code: 0  termination reason: Completed
 21/06/02 14:49:22 INFO LoggingPodStatusWatcherImpl: Application spark-pi with submission ID default:spark-pi-0895fe79ccc45249-driver finished
 21/06/02 14:49:22 INFO ShutdownHookManager: Shutdown hook called
 21/06/02 14:49:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-8ade5ae9-a24a-4da2-950f-3359b510b7cb

By monitoring pods with kubectl, it will be possible to notice that 4 pods have been started: 1 driver and 3 executors:

# kubectl get pods
 NAME                            READY   STATUS              RESTARTS   AGE
 spark-pi-1622477525687-driver   1/1     Running             0          16s
 spark-pi-1622477525687-exec-1   0/1     ContainerCreating   0          9s
 spark-pi-1622477525687-exec-2   0/1     Pending             0          9s
 spark-pi-1622477525687-exec-3   0/1     Pending             0          8s

At the end of the run, only the Spark driver pod is available, with Completed status:

# kubectl get pods
 NAME                            READY   STATUS      RESTARTS   AGE
 spark-pi-1622477525687-driver   0/1     Completed   0          75s

It is possible to read the computed value of π in the Spark execution log by inspecting the log of the driver pod:

# kubectl logs spark-pi-1622477525687-driver
++ id -u
 myuid=0
 ++ id -g
 mygid=0
 set +e
 ++ getent passwd 0
 uidentry=root:x:0:0:root:/root:/bin/bash
 set -e
 '[' -z root:x:0:0:root:/root:/bin/bash ']'
 SPARK_K8S_CMD=driver
 case "$SPARK_K8S_CMD" in
 shift 1
 SPARK_CLASSPATH=':/opt/spark/jars/*'
 sed 's/[^=]=(.)/\1/g'
 grep SPARK_JAVA_OPT_
 env
 sort -t_ -k4 -n
 readarray -t SPARK_EXECUTOR_JAVA_OPTS
 '[' -n '' ']'
 '[' -n '' ']'
 PYSPARK_ARGS=
 '[' -n '' ']'
 R_ARGS=
 '[' -n '' ']'
 '[' '' == 2 ']'
 '[' '' == 3 ']'
 case "$SPARK_K8S_CMD" in
 CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
 exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.29.152.138 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal
 21/05/31 16:12:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 21/05/31 16:12:12 INFO SparkContext: Running Spark version 2.4.4
 21/05/31 16:12:12 INFO SparkContext: Submitted application: Spark Pi
 21/05/31 16:12:12 INFO SecurityManager: Changing view acls to: root
 21/05/31 16:12:12 INFO SecurityManager: Changing modify acls to: root
 21/05/31 16:12:12 INFO SecurityManager: Changing view acls groups to: 
 21/05/31 16:12:12 INFO SecurityManager: Changing modify acls groups to: 
 21/05/31 16:12:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
 21/05/31 16:12:12 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
 21/05/31 16:12:12 INFO SparkEnv: Registering MapOutputTracker
 21/05/31 16:12:12 INFO SparkEnv: Registering BlockManagerMaster
 21/05/31 16:12:12 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
 21/05/31 16:12:12 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
 21/05/31 16:12:12 INFO DiskBlockManager: Created local directory at /var/data/spark-b18be789-d365-41c3-b750-28fbc161db77/blockmgr-62e61fec-dc30-4df0-b01a-803e802d27c1
 21/05/31 16:12:12 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
 21/05/31 16:12:12 INFO SparkEnv: Registering OutputCommitCoordinator
 21/05/31 16:12:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
 21/05/31 16:12:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-1622477525687-driver-svc.default.svc:4040
 21/05/31 16:12:13 INFO SparkContext: Added JAR file:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar at spark://spark-pi-1622477525687-driver-svc.default.svc:7078/jars/spark-examples_2.11-2.4.4.jar with timestamp 1622477533267
 21/05/31 16:12:14 INFO ExecutorPodsAllocator: Going to request 3 executors from Kubernetes.
 21/05/31 16:12:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
 21/05/31 16:12:14 INFO NettyBlockTransferService: Server created on spark-pi-1622477525687-driver-svc.default.svc:7079
 21/05/31 16:12:14 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
 21/05/31 16:12:14 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:14 INFO BlockManagerMasterEndpoint: Registering block manager spark-pi-1622477525687-driver-svc.default.svc:7079 with 413.9 MB RAM, BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:14 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:14 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-pi-1622477525687-driver-svc.default.svc, 7079, None)
 21/05/31 16:12:27 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.141.0.1:35422) with ID 1
 21/05/31 16:12:28 INFO BlockManagerMasterEndpoint: Registering block manager 172.29.112.144:38665 with 413.9 MB RAM, BlockManagerId(1, 172.29.112.144, 38665, None)
 21/05/31 16:12:44 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
 21/05/31 16:12:44 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
 21/05/31 16:12:45 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
 21/05/31 16:12:45 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
 21/05/31 16:12:45 INFO DAGScheduler: Parents of final stage: List()
 21/05/31 16:12:45 INFO DAGScheduler: Missing parents: List()
 21/05/31 16:12:45 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
 21/05/31 16:12:45 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
 21/05/31 16:12:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
 21/05/31 16:12:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-pi-1622477525687-driver-svc.default.svc:7079 (size: 1256.0 B, free: 413.9 MB)
 21/05/31 16:12:45 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
 21/05/31 16:12:45 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
 21/05/31 16:12:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
 21/05/31 16:12:45 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.29.112.144, executor 1, partition 0, PROCESS_LOCAL, 7885 bytes)
 21/05/31 16:12:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.29.112.144:38665 (size: 1256.0 B, free: 413.9 MB)
 21/05/31 16:12:46 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.29.112.144, executor 1, partition 1, PROCESS_LOCAL, 7885 bytes)
 21/05/31 16:12:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 719 ms on 172.29.112.144 (executor 1) (1/2)
 21/05/31 16:12:46 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 47 ms on 172.29.112.144 (executor 1) (2/2)
 21/05/31 16:12:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
 21/05/31 16:12:46 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.972 s
 21/05/31 16:12:46 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.107471 s
 Pi is roughly 3.1383356916784586
 21/05/31 16:12:46 INFO SparkUI: Stopped Spark web UI at http://spark-pi-1622477525687-driver-svc.default.svc:4040
 21/05/31 16:12:46 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
 21/05/31 16:12:46 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
 21/05/31 16:12:46 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
 21/05/31 16:12:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
 21/05/31 16:12:46 INFO MemoryStore: MemoryStore cleared
 21/05/31 16:12:46 INFO BlockManager: BlockManager stopped
 21/05/31 16:12:46 INFO BlockManagerMaster: BlockManagerMaster stopped
 21/05/31 16:12:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
 21/05/31 16:12:46 INFO SparkContext: Successfully stopped SparkContext
 21/05/31 16:12:46 INFO ShutdownHookManager: Shutdown hook called
 21/05/31 16:12:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-c60b2d8f-6be2-4c6c-bafa-e7af352b6715
 21/05/31 16:12:46 INFO ShutdownHookManager: Deleting directory /var/data/spark-b18be789-d365-41c3-b750-28fbc161db77/spark-137a193e-0278-4ab1-8bd4-cb448e8aa47f 
Updated on June 2, 2021

Leave a Comment