- Install Java: Hadoop requires Java to be installed on your system. You can install Java by running the following command:
sudo apt-get install default-jdk
- Download and extract Hadoop: You can download the latest version of Hadoop from the official website (https://hadoop.apache.org/releases.html). Once you have downloaded the file, extract it using the following command:
tar -xzvf hadoop-x.x.x.tar.gz
x.x.x with the version of Hadoop you downloaded.
- Configure Hadoop: Navigate to the
etc/hadoopdirectory in the extracted Hadoop directory and edit the
hadoop-env.shfile by uncommenting the
JAVA_HOMEvariable and setting it to the path where Java is installed.
- Configure Hadoop files: Navigate to the
etc/hadoopdirectory and edit the
core-site.xmlfile by adding the following configuration:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
This sets the default file system to HDFS.
Next, edit the
hdfs-site.xml file by adding the following configuration:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop/hdfs/datanode</value> </property> </configuration>
This sets the replication factor to 1 and configures the location of the HDFS name node and data node directories.
- Start Hadoop: Start the Hadoop daemon processes by running the following command from the Hadoop directory:
This will start the Hadoop distributed file system (HDFS) and the Hadoop MapReduce framework.
- Verify the installation: You can verify the installation by accessing the Hadoop web interface at
http://localhost:50070/. You should see the Hadoop web interface, which provides information about the status of the HDFS and the running MapReduce jobs.
That’s it! You have successfully installed Hadoop on Linux Ubuntu and configured it for use.