How to Install Hadoop with Step by Step Configuration on Linux Ubuntu

      Comments Off on How to Install Hadoop with Step by Step Configuration on Linux Ubuntu
  1. Install Java: Hadoop requires Java to be installed on your system. You can install Java by running the following command:
sudo apt-get install default-jdk
  1. Download and extract Hadoop: You can download the latest version of Hadoop from the official website (https://hadoop.apache.org/releases.html). Once you have downloaded the file, extract it using the following command:
 tar -xzvf hadoop-x.x.x.tar.gz

Replace x.x.x with the version of Hadoop you downloaded.

  1. Configure Hadoop: Navigate to the etc/hadoop directory in the extracted Hadoop directory and edit the hadoop-env.sh file by uncommenting the JAVA_HOME variable and setting it to the path where Java is installed.
export JAVA_HOME=/usr/lib/jvm/default-java
  1. Configure Hadoop files: Navigate to the etc/hadoop directory and edit the core-site.xml file by adding the following configuration:
 <configuration>
   <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

This sets the default file system to HDFS.

Next, edit the hdfs-site.xml file by adding the following configuration:

 <configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>/hadoop/hdfs/namenode</value>
   </property>
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/hadoop/hdfs/datanode</value>
   </property>
</configuration>

This sets the replication factor to 1 and configures the location of the HDFS name node and data node directories.

  1. Start Hadoop: Start the Hadoop daemon processes by running the following command from the Hadoop directory:
 codesbin/start-all.sh

This will start the Hadoop distributed file system (HDFS) and the Hadoop MapReduce framework.

  1. Verify the installation: You can verify the installation by accessing the Hadoop web interface at http://localhost:50070/. You should see the Hadoop web interface, which provides information about the status of the HDFS and the running MapReduce jobs.

That’s it! You have successfully installed Hadoop on Linux Ubuntu and configured it for use.