Ubuntu Installation
Version: Ubuntu Desktop 18.04.2 LTS
Installation Tutorial: Install Ubuntu Desktop
Disable Auto Update
Disable Auto Shut Down and Sleep
Notice: In log in details session, please set your computer’s name as master/slave1/slave2/slave3 and set username as hadoop across all machines.
Hadoop Environment Setup
Pre-installation Setup
Checking Hostname
1 | hadoop@slave1:~$ hostname |
Checking Current IP Address
1 | hadoop@slave1:~$ hostname -I |
Install vim
1 | hadoop@slave1:~$ cd / |
Add IP Addresses
Insert the information from the table below into /etc/hosts file.
IP Addresses | Hostnames |
---|---|
10.22.17.39 | master |
10.22.16.84 | slave1 |
10.22.17.150 | slave2 |
10.22.17.79 | slave3 |
Command to open and insert information:
1 | hadoop@slave1:/etc$ sudo vim hosts |
Check if connections to other machines can be established:
1 | hadoop@slave1:/etc$ ping master |
Java JDK Installation
Version: Java SE Development Kit 8u221 (Requires Registration)
Extract the Files to /usr/lib/jvm/
Add Java’s Path into $PATH
Open /etc/profile file to write the Java path into:
1 | hadoop@slave1:~$ cd /etc |
Insert the following code into the file:
1 | export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_211 |
Source the file to apply the changes:
1 | hadoop@slave1:~$ source profile |
Notice: You may need to restart your computer to apply the changes permanently.
Check Java Version and Path
1 | hadoop@slave1:~$ java -version |
Setup SSH
Setting up this to allow the machines to connect each other without entering passwords.
1 | #Installing SSH |
Type these lines in the end of the sshd_config file:
1 | RSAAuthentication yes |
Restarting the SSH service and copy its ssh id to other machines:
1 | hadoop@slave1:~/.ssh/$ service ssh restart |
Hadoop Installation
Install Hadoop
- Version: Hadoop 3.1.2 (Binary)
- Move Hadoop folder to /usr/ folder
- Change folder name into hadoop
- Make tmp folder inside of the hadoop folder
1 | hadoop@slave1:/$ sudo mv hadoop-3.1.2/ /usr/ |
Add Hadoop’s Path into $PATH
Open /etc/profile file to write the Hadoop path into:
1 | hadoop@slave1:~$ cd /etc |
Insert the following code into the file:
1 | export HADOOP_HOME=/usr/hadoop |
Source the file to apply the changes:
1 | hadoop@slave1:~$ source profile |
Notice: You may need to restart your computer to apply the changes permanently.
Configure Hadoop
Change configeration settings in 5 following files:
1 | #Finding the paths on each file |
5 paths:
- hadoop-env.sh - hadoop/etc/hadoop/hadoop-env.sh
- core-site.xml - hadoop/etc/hadoop/core-site.xml
- hdfs-site.xml - hadoop/etc/hadoop/hdfs-site.xml
- mapred-site.xml - hadoop/etc/hadoop/mapred-site.xml
- yarn-site.xml - hadoop/etc/hadoop/yarn-site.xml
Configure hadoop_env.sh:
1 | export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_211 |
Configure core-site.xml:
1 | <configuration> |
Configure hdfs-site.xml:
1 | <configuration> |
Configure mapred-site.xml:
1 | <configuration> |
Configure yarn-site.xml:
1 | <property> |
The following command is going to insert the name for all workers in lines:
1 | hadoop@slave1:/usr/hadoop/etc/hadoop$ vim workers |
Here is the content in workers file:
1 | master |
Spark Environment Setup
Install Spark
- Version: Spark 2.4.3 (Binary)
- Move Spark folder to /usr/hadoop/ folder
- Change folder name into spark
Add Pathes into $PATH
Open the .bashrc file:
1 | hadoop@master:~$ vim .bashrc |
In .bashrc file, insert the following lines:
1 | export HADOOP_HOME=/usr/hadoop |
Source the file to apply the changes:
1 | hadoop@master:~$ source .bashrc |
Notice: You may need to restart your computer to apply the changes permanently.
Configure Spark
Configure the spark-defaults.conf file:
1 | # Change the template into actual file |
Checking Spark version
1 | hadoop@master:/usr/hadoop$ spark-shell –-version |
Setup Public Jupyter Notebook
After the installation of jupyter:
1 | hadoop@master:~$ jupyter notebook --generate-config |
Uncomment lines and adjust some values:
1 | c.NotebookApp.ip = 'master' |
Notice: After the first change of the password and login please set allow_password_change into False or comment it out.
1 | c.NotebookApp.allow_password_change = False |
Extras
Change Machines’ Username
The username should be all the same in different machines because when hadoop connects to other machines, it uses its username as default username for the other machines to connect each other.
1 | #Here should be the format for username@hostname on each machine |
If the username has been set wrong by mistake when installing the system, it needs changing by using the following commands:
1 | hadoop@slave1:~$ sudo passwd root |
Notice: These command can only run after logging into other user. So, please create a new user and then logout the purpose user and login into the new user to type these command to change the purpose user’s username by typing the commands above to the new user’s terminal.
Docker (Suspended - Not in use)
Login Docker:
1 | hadoop@slave1:~$ sudo docker login |
Pull -> Run
Useful Commands
General Commands
1 | #Format the namenode - ONLY RUN ONCE |
Resetting $PATH (In case if the $PATH is overwritten by mistake)
1 | export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:" |
References
Hadoop
Hadoop Environment Configeration
Java JDK
How to install the JDK on Ubuntu Linux (OpenJDK)
Differences between OpenJDK and Oracle JDK
Spark
How to set up PySpark for your Jupyter notebook
Install Spark 2.3.x on YARN with Hadoop 3.x
Jupyter Notebook
Tutorial on setting up public jupyter notebook
HDFS
Using hdfs command line to manage files and directories on Hadoop