Apache Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Here we describe how to add Oozie to a pre-existing Hadoop instance “hdp230”, based on Hortonworks HDP 2.3.0. We then show how to use it to run Mapreduce jobs
1. Add oozie group/user to head node and Hadoop nodes
Execute the following commands on the active head node and in the chroot environment for the software image(s) used by compute nodes.
# /usr/bin/getent group oozie || /usr/sbin/groupadd -r oozie
# /usr/bin/getent passwd oozie || /usr/sbin/useradd --comment "Oozie" --shell /bin/bash -m -r -g oozie --home /var/run/oozie oozie
2. Add stanzas (if needed) in core-site.xml (all Hadoop nodes)
The following two stanzas should be present in core-site.xml
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
If core-site.xml does not include the stanzas, they can be added using the following commands, which assume that Hadoop nodes are in ‘default’ category:
# sed -i.bak 's/<\/configuration>/ <property>\n<name>hadoop\.proxyuser\.oozie\.hosts<\/name>\n <value>\*<\/value>\n<\/property>\n\n <property>\n <name>hadoop\.proxyuser\.oozie\.groups<\/name>\n <value>\*<\/value>\n<\/property>\n\n<\/configuration>/' /etc/hadoop/hdp230/core-site.xml
# pdsh -g category=default "sed -i.bak 's/<\/configuration>/<property>\n <name>hadoop\.proxyuser\.oozie\.hosts<\/name>\n <value>\*<\/value>\n <\/property>\n\n <property>\n <name>hadoop\.proxyuser\.oozie\.groups<\/name>\n <value>\*<\/value>\n<\/property>\n\n<\/configuration>/' /etc/hadoop/hdp230/core-site.xml"
3. Restart all Hadoop services to apply modifications# /cm/local/apps/cluster-tools/hadoop/cm-hadoop-maint -i hdp230 --restart
4. Download Oozie and unpack it
Execute the following commands as root on the active head node. The Ext-2.2 library is needed by the Oozie web console.
# cd /tmp/
# curl -O http://s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.3.0.0/tars/oozie-4.2.0.2.3.0.0-2557-distro.tar.gz
# cd /cm/shared/apps/hadoop/Hortonworks
# tar xvzf /tmp/oozie-4.2.0.2.3.0.0-2557-distro.tar.gz
# cd oozie-4.2.0.2.3.0.0-2557
# tar xvzf oozie-examples.tar.gz
# mkdir libext
# cd libext
# curl -O http://dev.sencha.com/deploy/ext-2.2.zip
5. Change ownership permissions for some directories# cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/
# mkdir logs
# chown oozie:oozie logs
# mkdir data
# chown oozie:oozie data
# chown -R oozie:oozie oozie-server
6. Create Oozie database# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./ooziedb.sh create -run
7. Prepare WAR file# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./oozie-setup.sh prepare-war
8. Create directory for oozie in HDFS# module load hadoop
# su -c 'hdfs dfs -mkdir /user/oozie' hdfs
# su -c 'hdfs dfs -chown oozie:oozie /user/oozie' hdfs
9. Upload sharelib to HDFS
Substitute node001 with the NameNode hostname.# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./oozie-setup.sh sharelib create -fs hdfs://node004:8020 -locallib /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/oozie-sharelib-4.2.0.2.3.0.0-2557.tar.gz
10. Edit Oozie configuration# cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/conf
# nano oozie-site.xml
Modify <value> to be consistent with the Hadoop configuration directory path: <property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/etc/hadoop/hdp230</value> </property>
11. Start Oozie
Oozie should be started by running it as the oozie user. Use ‘run’ to run it in the foreground, ‘start’ to run it in the background. Log files can be found in /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/logs
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin/
$ ./oozied.sh run
or
$ ./oozied.sh start
12. Check web console
The Oozie web console is available on the head node at http://localhost:11000
13. Edit Oozie job configuration# cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/examples/apps/map-reduce
# nano job.properties
Using nano or another text editor, the following properties should be changed:
nameNode=hdfs://node001:8020
jobTracker=node003:8032
Here node001 is the NameNode and node003 is the ResourceManager (YARN server), with default port 8032
14. Upload examples to HDFS
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557
$ module load hadoop
$ hdfs dfs -put examples examples
15. Run job# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./oozie job -oozie http://localhost:11000/oozie -config ../examples/apps/map-reduce/job.properties -run
16. Check web consoles
Oozie web console (http://localhost:11000) should show the submitted job
YARN web console (http://node003:8088) should show the correspoding
application, with:type = MAPREDUCE
name = oozie:launcher:T=map-reduce:W=map-reduce-wf:A=mr-node:ID=0000000-141218162900779-oozie-oozi-W
17. Check job results# su - oozie
$ module load hadoop
$ hdfs dfs -cat /user/oozie/examples/output-data/map-reduce/*