Getting Started With Hadoop On MAC OS

Getting started is difficult but Doable! 

If you have been struggling to get yourself started with Hadoop, the first thing would be to set it up properly. This tutorial will help you in setting up Hadoop with basic configuration on Mac OS. Don’t worry, You don’t need an ubuntu machine to really get started unless you are planning to have multi node clusters.

Go through these simple steps to setup Hadoop single node cluster on Mac OS.

The tutorial have been tested on Mac OS X , Version – 10.11.16

PreRequisites –  Java Installed on your system. You can check If Java is installed on your system with Java version. It should print something like below in output –

java version “1.8.0_73”

Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)


1. Creating Separate User – For security and administration reasons, it is recommended that you create an Hadoop Operating System User. You can create a new user from Launchpad->System Preferences->Users & Groups. If you create the User hadoop, create the account as a Standard user. You should now log out and log back in using that user.

2. SSH localhost – To Use Hadoop, It will be necessary for Hadoop to have an ability to establish ssh connection to  localhost. It would be required that we do that ssh without prompts for password or secrets, which we can achieve by generating a ssh key and saving that key in $HOME/.ssh/authorized_keys Generate a ssh key like this –

ssh-keygen -t rsa -P ""

You will be asked to Enter file in which to save the key. The default value is /Users/hadoop/.ssh/id_rsa. A key has been created now. A passphrase is not required to use this key file -P "".  Now you can authorize your key by adding it to the authorized_keys –

cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

 You can try ssh localhost and it should not ask for password. You can logout from                localhost now.

3. Download and install Hadoop –  The next step is to download and install hadoop. Go to hadoop downloads link to download the latest version of hadoop. Assuming hadoop has been downloaded in $HOME/downloads. Hadoop has been downloaded gzip compressed tar file. Uncompress the file using gunzip –

gunzip $HOME/Downloads/hadoop-2.6.0.tar.gz

Above command will remove the gz extension. We need to extract hadoop source code in /usr/local  directory. Go to /usr/local directory and run the following command to extract hadoop source code.

sudo tar xvf $HOME/Downloads/hadoop-2.6.0.tar

Hadoop will now be located at /usr/local/hadoop-2.6.0. To make it easier to access current version of hadoop, we can simply create a symlink as /usr/local/hadoop(OPTIONAL). Symlink can be generated using following command –

sudo ln -s hadoop-2.6.0 hadoop

Now, we need to set the ownership of the installed files for hadoop user. You can use chown command for this.

sudo chown -R hadoop:staff hadoop-2.6.0 hadoop

Yayy!! Hadoop has been installed successfully.

4. Basic Configuration – 

4.1 Updating Login Profile: We need to update our PATH to include hadoop executables path. We also need to export some of the hadoop environment variables.

export HADOOP_PREFIX="/usr/local/hadoop"
export HADOOP_CONF_DIR="${HADOOP_PREFIX}/etc/hadoop"

export "PATH=${PATH}:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin"

You can load these changes using following command :

. $HOME/.bash_profile

4.2 : This is the main configuration file. We need to tell Hadoop the location of Java Directory i.e. Directory immediately precedes the bin directory of the java program. In My case Java Home directory is – /usr/libexec/java_home

Now, You can update the file –

vi /usr/local/hadoop/etc/hadoop/

export JAVA_HOME="/usr/libexec/java_home"

4.2 HDFS Config : This is for basic configuration of Hadoop Distributed File system. We specify HDFS site related config in hdfs-site.xml. We will specify both the name node and data node configuration in this file.

4.2.1 NameNode : Name Node is the centrepiece of HDFS. It does not store any data, it just stores the location of all the files and directories with in the Hadoop Distributed File system. Clients directly interact with the name node to know the location of the file. This makes Name Node a most important part of HDFS.

As we will be configuring Hadoop in a single node cluster format, name node would also be the current machine. We specify the name node config in the following way in hdfs-site.xml.


4.2.2 DataNode : Data Nodes are the individual machines within the Hadoop cluster. A cluster can consist of one to many data nodes. Data Nodes actually stores data in form of blocks. Name Node talks to data nodes when it needs to access some files in the system.

Our local computer would act as both name node and data node here.

5. Quick Test – Now Hadoop is ready with a basic config of HDFS. Try running following command –

hadoop version

You Should see version of hadoop being loaded properly.

Run following commands to start name node and data node.

usr/local/hadoop/bin/hadoop namenode -format


You should see a log saying name node and data node started. You can stop the processing using

There You Go! All set to Hadoopify yourselves. Please drop a comment in comments section in case you face some issues in above configuration, Would be Happy to Help.

Image Source –


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s