Deploy Spark using CMDaemon
This example will deploy the spark master on the Head Node. All the workers will be deployed in the default category.
Install the package
Execute on the Head Node, below example is for RHEL8:
# yum install -y git
# SPARK_VERSION="spark-2.4.1-bin-hadoop2.7"
# INSTALL_DIR="/cm/shared/apps/spark/"
# MODULE_DIR="/cm/shared/modulefiles/spark/"
# mkdir -p "${INSTALL_DIR}"
# mkdir -p "${MODULE_DIR}"
# wget "https://archive.apache.org/dist/spark/spark-2.4.1/${SPARK_VERSION}.tgz"
# tar -xzvf "${SPARK_VERSION}.tgz" -C "${INSTALL_DIR}"
# pushd "${INSTALL_DIR}"
# ln -sr "${SPARK_VERSION}/" current
# cd current
# git clone https://github.com/Bright-Computing/bare-metal-spark.git
# cd bare-metal-spark/
# mv modulefile/2.4.1 "${MODULE_DIR}"
# cp -pr scripts/ ..
# popd
Note: these instructions deploy Spark to /cm/shared
. Scripts contained in the bare-metal-spark
git repository refer to this path. When the default installation directory is changed, script paths have to be also changed accordingly.
Install the service files
Then execute the following (also on the Head Node):
# pushd /cm/shared/apps/spark/current/scripts
# systemctl link $PWD/spark-master.service
# systemctl enable spark-master
# systemctl start spark-master
# cp -prv $PWD/spark-worker@.service /usr/lib/systemd/system/
# cp -prv $PWD/spark-worker@.service /cm/images/default-image/usr/lib/systemd/system/
# systemctl enable spark-worker@master:7077.service
# systemctl start spark-worker@master:7077.service
# chroot /cm/images/default-image systemctl enable spark-worker@master:7077.service
# popd
Configure the Generic Roles for Master/Worker nodes in cmsh
[headnode]% configurationoverlay
[headnode->configurationoverlay]% add spark-master
[headnode->configurationoverlay*[spark-master*]]% set nodes master
[headnode->configurationoverlay*[spark-master*]]% roles
[headnode->configurationoverlay*[spark-master*]->roles]% assign generic::spark-master
[headnode->configurationoverlay*[spark-master*]->roles*[Generic::spark-master*]]% set services spark-master
[headnode->configurationoverlay*[spark-master*]->roles*[Generic::spark-master*]]% ..
[headnode->configurationoverlay*[spark-master*]->roles*]% ..
[headnode->configurationoverlay*[spark-master*]]% ..
[headnode->configurationoverlay*]% add spark-worker
[headnode->configurationoverlay*[spark-worker*]]% set categories default
[headnode->configurationoverlay*[spark-worker*]]% roles
[headnode->configurationoverlay*[spark-worker*]->roles]% assign generic::spark-worker
[headnode->configurationoverlay*[spark-worker*]->roles*[Generic::spark-worker*]]% set services spark-worker@master:7077
[headnode->configurationoverlay*[spark-worker*]->roles*[Generic::spark-worker*]]% ..
[headnode->configurationoverlay*[spark-worker*]->roles*]% ..
[headnode->configurationoverlay*[spark-worker*]]% ..
[headnode->configurationoverlay*]% list
Name (key) Priority Nodes Categories Roles
-------------------- ---------- ------------------------------ ---------------- ----------------------------------------------------------------------------------------------------------------------------------------------------
spark-master 500 headnode Generic::spark-master
spark-worker 500 default Generic::spark-worker
[headnode->configurationoverlay*]% commit
Successfully committed 2 ConfigurationOverlays
Provision the compute nodes
Reboot the compute nodes.
Test the Results:
# module load spark/2.4.1
# spark-submit --master local --class org.apache.spark.examples.SparkPi /cm/shared/apps/spark/current/examples/jars/spark-examples_2.11-2.4.1.jar
20/02/07 14:12:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/02/07 14:12:16 INFO SparkContext: Running Spark version 2.4.1
20/02/07 14:12:16 INFO SparkContext: Submitted application: Spark Pi
20/02/07 14:12:16 INFO SecurityManager: Changing view acls to: root
20/02/07 14:12:16 INFO SecurityManager: Changing modify acls to: root
20/02/07 14:12:16 INFO SecurityManager: Changing view acls groups to:
20/02/07 14:12:16 INFO SecurityManager: Changing modify acls groups to:
20/02/07 14:12:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
20/02/07 14:12:17 INFO Utils: Successfully started service 'sparkDriver' on port 37293.
20/02/07 14:12:17 INFO SparkEnv: Registering MapOutputTracker
20/02/07 14:12:17 INFO SparkEnv: Registering BlockManagerMaster
20/02/07 14:12:17 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/02/07 14:12:17 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/02/07 14:12:17 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e1f540fd-4f3c-4774-917b-d30aaa4dcc07
20/02/07 14:12:17 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/02/07 14:12:17 INFO SparkEnv: Registering OutputCommitCoordinator
20/02/07 14:12:17 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/02/07 14:12:17 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://ma-c-02-07-b90-dev-c8u0.cm.cluster:4040
20/02/07 14:12:17 INFO SparkContext: Added JAR file:/cm/shared/apps/spark/2.4.1/examples/jars/spark-examples_2.11-2.4.1.jar at spark://ma-c-02-07-b90-dev-c8u0.cm.cluster:37293/jars/spark-examples_2.11-2.4.1.jar with timestamp 1581081137459
20/02/07 14:12:17 INFO Executor: Starting executor ID driver on host localhost
20/02/07 14:12:17 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33549.
20/02/07 14:12:17 INFO NettyBlockTransferService: Server created on ma-c-02-07-b90-dev-c8u0.cm.cluster:33549
20/02/07 14:12:17 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/02/07 14:12:17 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ma-c-02-07-b90-dev-c8u0.cm.cluster, 33549, None)
20/02/07 14:12:17 INFO BlockManagerMasterEndpoint: Registering block manager ma-c-02-07-b90-dev-c8u0.cm.cluster:33549 with 366.3 MB RAM, BlockManagerId(driver, ma-c-02-07-b90-dev-c8u0.cm.cluster, 33549, None)
20/02/07 14:12:17 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ma-c-02-07-b90-dev-c8u0.cm.cluster, 33549, None)
20/02/07 14:12:17 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ma-c-02-07-b90-dev-c8u0.cm.cluster, 33549, None)
20/02/07 14:12:18 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
20/02/07 14:12:18 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
20/02/07 14:12:18 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
20/02/07 14:12:18 INFO DAGScheduler: Parents of final stage: List()
20/02/07 14:12:18 INFO DAGScheduler: Missing parents: List()
20/02/07 14:12:18 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
20/02/07 14:12:18 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 366.3 MB)
20/02/07 14:12:18 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 366.3 MB)
20/02/07 14:12:18 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ma-c-02-07-b90-dev-c8u0.cm.cluster:33549 (size: 1256.0 B, free: 366.3 MB)
20/02/07 14:12:18 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
20/02/07 14:12:18 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
20/02/07 14:12:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/02/07 14:12:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
20/02/07 14:12:18 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/02/07 14:12:18 INFO Executor: Fetching spark://ma-c-02-07-b90-dev-c8u0.cm.cluster:37293/jars/spark-examples_2.11-2.4.1.jar with timestamp 1581081137459
20/02/07 14:12:18 INFO TransportClientFactory: Successfully created connection to ma-c-02-07-b90-dev-c8u0.cm.cluster/10.141.255.254:37293 after 34 ms (0 ms spent in bootstraps)
20/02/07 14:12:18 INFO Utils: Fetching spark://ma-c-02-07-b90-dev-c8u0.cm.cluster:37293/jars/spark-examples_2.11-2.4.1.jar to /tmp/spark-ea8c13ef-1bab-473d-bf84-fb48e7c4d764/userFiles-ac2a6630-9404-48f2-a1c3-e9a9fd4836aa/fetchFileTemp3346364518087292857.tmp
20/02/07 14:12:18 INFO Executor: Adding file:/tmp/spark-ea8c13ef-1bab-473d-bf84-fb48e7c4d764/userFiles-ac2a6630-9404-48f2-a1c3-e9a9fd4836aa/spark-examples_2.11-2.4.1.jar to class loader
20/02/07 14:12:18 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 824 bytes result sent to driver
20/02/07 14:12:18 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
20/02/07 14:12:18 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/02/07 14:12:18 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 270 ms on localhost (executor driver) (1/2)
20/02/07 14:12:18 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 824 bytes result sent to driver
20/02/07 14:12:18 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 57 ms on localhost (executor driver) (2/2)
20/02/07 14:12:18 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.609 s
20/02/07 14:12:18 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/02/07 14:12:18 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.671583 s
Pi is roughly 3.145075725378627
20/02/07 14:12:18 INFO SparkUI: Stopped Spark web UI at http://ma-c-02-07-b90-dev-c8u0.cm.cluster:4040
20/02/07 14:12:18 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/02/07 14:12:18 INFO MemoryStore: MemoryStore cleared
20/02/07 14:12:18 INFO BlockManager: BlockManager stopped
20/02/07 14:12:18 INFO BlockManagerMaster: BlockManagerMaster stopped
20/02/07 14:12:18 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/02/07 14:12:18 INFO SparkContext: Successfully stopped SparkContext
20/02/07 14:12:18 INFO ShutdownHookManager: Shutdown hook called
20/02/07 14:12:18 INFO ShutdownHookManager: Deleting directory /tmp/spark-bf5770af-99e9-4963-8aeb-f22791bada54
20/02/07 14:12:18 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea8c13ef-1bab-473d-bf84-fb48e7c4d764