Setup Hadoop on Ubuntu Server 14.04 LTS

Some notes on getting Hadoop installed on Ubuntu LTS in a local VM. This is only suitable for dev environments, for production additional steps such as setting up user accounts are advisable.

1) Install ssh:

sudo apt-get install ssh

2) Install Java Developer Kit (JDK) and set the JAVA_HOME variable:

sudo apt-get install default-jdk

Open /etc/environment in a text editor and add the path to your Java JDK (if JAVA_HOME is not already defined):

JAVA_HOME="/usr/lib/jvm/java-1.7.0-openjdk-amd64"

Note that I’ve seen some articles suggest using export PATH=... syntax from terminal. This value is only set until the machine is next rebooted when this approach is taken (though adding this to ~/.bashrc is ok).

3) Get the latest version of Hadoop. See here to find the latest tarball here.

wget http://apache.mirror.anlx.net/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

4) Unpack the tarball:

tar zxvf hadoop-2.7.2.tar.gz

5) (Optionally) move the folder to /usr/local/bin:

sudo mv hadoop-2.7.2 /usr/local/bin/hadoop

6) Now check everything is working:

/usr/local/bin/hadoop/bin/hadoop

The terminal should display the help guide.

7) Typing all the above is obviously a pain. Now (optionally) add hadoop to the PATH environment variable so you don’t have to type the full path to hadoop in every time:

Edit /etc/environment in a text editor (such as gedit). Add the following to the PATH variable:

:/usr/local/bin/hadoop/bin

So it should read something like this:

PATH="/usr/local/sbin:/usr/local/bin:/usr/local/bin/hadoop/bin

Verify by typing hadoop into your terminal.

Tags:

Leave a Reply