Prerequisites
- Access to a terminal window/command line
- Sudo or root privileges on local /remote machines
Use the following command to update your system before initiating a new installation:
sudo apt update
Type the following command in your terminal to install OpenJDK 8:
sudo apt install openjdk-8-jdk -y
Once the installation process is complete, verify the current Java version:
java -version; javac -version
Install OpenSSH on Ubuntu
Install the OpenSSH server and client using the following command:
sudo apt install openssh-server openssh-client -y
Create Hadoop User
Utilize the adduser command to create a new Hadoop user:
sudo adduser ashwin
enter the corresponding password:
su - ashwin
Generate an SSH key pair and define the location is is to be stored in:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Use the cat command to store the public key as authorized_keys in the ssh directory:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Set the permissions for your user with the chmod command:
chmod 0600 ~/.ssh/authorized_keys
hdoop user to SSH to localhost:
ssh localhost
Download and Install Hadoop on Ubuntu
Visit the official Apache Hadoop project page, and select the version of Hadoop you want to implement.
tar xzf hadoop-3.2.1.tar.gz
Single Node Hadoop Deployment (Pseudo-Distributed Mode)
Edit the .bashrc shell configuration file using a text editor of your choice (we will be using nano):
sudo nano .bashrc
Define the Hadoop environment variables by adding the following content to the end of the file:
#Hadoop Related Options
export HADOOP_HOME=/home/ashwin/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS"-Djava.library.path=$HADOOP_HOME/lib/nativ"
It is vital to apply the changes to the current running environment by using the following command:
source ~/.bashrc
Use the previously created $HADOOP_HOME variable to access the hadoop-env.sh file:
sudo nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
. If you have installed the same version as presented in the first part of this tutorial, add the following line:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Use the provided path to find the OpenJDK directory with the following command:
echo $JAVA_HOME
Open the core-site.xml file in a text editor:
sudo nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following configuration to override the default values for the temporary directory and add your HDFS URL to replace the default local file system setting:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ashwin/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
Use the following command to open the hdfs-site.xml file for editing:
sudo nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following configuration to the file and, if needed, adjust the NameNode and DataNode directories to your custom locations:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/ashwin/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/ashwin/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Use the following command to access the mapred-site.xml file and define MapReduce values:
sudo nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following configuration to change the default MapReduce framework name value to yarn:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Open the yarn-site.xml file in a text editor:
sudo nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Append the following configuration to the file:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
It is important to format the NameNode before starting Hadoop services for the first time:
hdfs namenode -format
Navigate to the hadoop-3.2.1/sbin directory and execute the following commands to start the NameNode and DataNode:
./start-all.sh
Type this simple command to check if all the daemons are active and running as Java processes:
jps
Done! ๐
Top comments (0)