1. Home
  2. Big data
  3. How can I add Apache Oozie to my Hadoop instance?

How can I add Apache Oozie to my Hadoop instance?

Apache Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Here we describe how to add Oozie to a pre-existing Hadoop instance “hdp230”, based on Hortonworks HDP 2.3.0.  We then show how to use it to run Mapreduce jobs

1. Add oozie group/user to head node and Hadoop nodes

Execute the following commands on the active head node and in the chroot environment for the software image(s) used by compute nodes.

# /usr/bin/getent group oozie || /usr/sbin/groupadd -r oozie
# /usr/bin/getent passwd oozie || /usr/sbin/useradd --comment "Oozie" --shell /bin/bash -m -r -g oozie --home /var/run/oozie oozie

2. Add stanzas (if needed) in core-site.xml (all Hadoop nodes)

The following two stanzas should be present in core-site.xml
 <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>

  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>

If core-site.xml does not include the stanzas, they can be added using the following commands, which assume that Hadoop nodes are in ‘default’ category:

# sed -i.bak 's/<\/configuration>/ <property>\n<name>hadoop\.proxyuser\.oozie\.hosts<\/name>\n <value>\*<\/value>\n<\/property>\n\n <property>\n <name>hadoop\.proxyuser\.oozie\.groups<\/name>\n <value>\*<\/value>\n<\/property>\n\n<\/configuration>/' /etc/hadoop/hdp230/core-site.xml

# pdsh -g category=default "sed -i.bak 's/<\/configuration>/<property>\n <name>hadoop\.proxyuser\.oozie\.hosts<\/name>\n <value>\*<\/value>\n <\/property>\n\n <property>\n <name>hadoop\.proxyuser\.oozie\.groups<\/name>\n <value>\*<\/value>\n<\/property>\n\n<\/configuration>/' /etc/hadoop/hdp230/core-site.xml"

3. Restart all Hadoop services to apply modifications
# /cm/local/apps/cluster-tools/hadoop/cm-hadoop-maint -i hdp230 --restart

4. Download Oozie and unpack it

Execute the following commands as root on the active head node. The Ext-2.2 library is needed by the Oozie web console.

# cd /tmp/
# curl -O http://s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.3.0.0/tars/oozie-4.2.0.2.3.0.0-2557-distro.tar.gz
# cd /cm/shared/apps/hadoop/Hortonworks
# tar xvzf /tmp/oozie-4.2.0.2.3.0.0-2557-distro.tar.gz
# cd oozie-4.2.0.2.3.0.0-2557
# tar xvzf oozie-examples.tar.gz
# mkdir libext
# cd libext
# curl -O http://dev.sencha.com/deploy/ext-2.2.zip

5. Change ownership permissions for some directories
# cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/
# mkdir logs
# chown oozie:oozie logs
# mkdir data
# chown oozie:oozie data
# chown -R oozie:oozie oozie-server

6. Create Oozie database
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./ooziedb.sh create -run

7. Prepare WAR file
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./oozie-setup.sh prepare-war

8. Create directory for oozie in HDFS
# module load hadoop
# su -c 'hdfs dfs -mkdir /user/oozie' hdfs
# su -c 'hdfs dfs -chown oozie:oozie /user/oozie' hdfs

9. Upload sharelib to HDFS

Substitute node001 with the NameNode hostname.
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./oozie-setup.sh sharelib create -fs hdfs://node004:8020 -locallib /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/oozie-sharelib-4.2.0.2.3.0.0-2557.tar.gz

10. Edit Oozie configuration
# cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/conf
# nano oozie-site.xml

Modify <value> to be consistent with the Hadoop configuration directory path:   <property>      <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
      <value>*=/etc/hadoop/hdp230</value>    </property> 

11. Start Oozie

Oozie should be started by running it as the oozie user. Use ‘run’ to run it in the foreground, ‘start’ to run it in the background. Log files can be found in /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/logs
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin/
$ ./oozied.sh run

or

$ ./oozied.sh start

12. Check web console

The Oozie web console is available on the head node at http://localhost:11000

13. Edit Oozie job configuration
# cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/examples/apps/map-reduce
# nano job.properties

Using nano or another text editor, the following properties should be changed:

nameNode=hdfs://node001:8020
jobTracker=node003:8032

Here node001 is the NameNode and node003 is the ResourceManager (YARN server), with default port 8032

14. Upload examples to HDFS

# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557
$ module load hadoop
$ hdfs dfs -put examples examples

15. Run job
# su - oozie
$ cd /cm/shared/apps/hadoop/Hortonworks/oozie-4.2.0.2.3.0.0-2557/bin
$ ./oozie job -oozie http://localhost:11000/oozie -config ../examples/apps/map-reduce/job.properties -run

16. Check web consoles
Oozie web console (http://localhost:11000) should show the submitted job
YARN web console (http://node003:8088) should show the correspoding
application, with:
type  = MAPREDUCE
name = oozie:launcher:T=map-reduce:W=map-reduce-wf:A=mr-node:ID=0000000-141218162900779-oozie-oozi-W

17. Check job results
# su - oozie
$ module load hadoop
$ hdfs dfs -cat /user/oozie/examples/output-data/map-reduce/*

Updated on October 13, 2020

Related Articles

Leave a Comment