How to Write Operation Is Executed In Hadoop Distributed File System??
Description:
Master →
Public IP → 54.56.155.165
Private IP → 172.31.32.16
Datanode1 →
Public IP → 13.232.54.129
Private IP → 172.31.39.85
Datanode2 →
Public IP → 100.26.156.190
Private IP → 172.31.49.174
Datanode3 →
Public IP → 52.66.51.125
Private IP → 172.31.11.231
Client →
Public IP → 13.235.24.36
Private IP → 172.31.35.22
To setup the Hadoop cluster we have to follow some procedure:
Step:1 Setting up master node:
Master is also called as namenode in HDFS. For setting up namenode we have to setup/configure hdfs-site.xml and core-site.xml .
→ In namenode we have to make a central directory of storage . All datanode contribute their storage to namenode directory. So we make a directory named as /nn in namenode .
→ Setup the hdfs-site.xml file in master :
→ Setup core.site.xml file in master :
In core-site.xml file we have given IP 0.0.0.0 so that everyone can connect but for making private cluster we have given certain firewall rules in AWS master node instance .
→ Firewall rule allow in master node for some datanodes and client node.
→ Formatting namenode directory
> hadoop namenode -format
→ Start the namenode
Step :2 Setting up slave node:
Slave nodes also called as datanode . For setting up hadoop cluster we set three datanodes .
We make directory in all data nodes. So that data node contribute their storage.
- datanode 1 → /dn1
- datanode 2 → /dn2
- datanode 3 → /dn3
Now we have to configure hdfs-site.xml and core-site.xml in all datanodes :
hdfs-site.xml file in datanode 1 →
hdfs-site.xml file in datanode 2 →
hdfs-site.xml file in datanode 3→
core-site.xml setup in all datanodes :
starting datanode by command:
> hadoop-daemon.sh start datanode
Step 3: Check status of cluster
Step: 4 Setup client node
Now client only have to setup core-site.xml .
Now client will upload file:
Step:5 All datanode run tcpdump command . It will help to see the output of how file writing perform .
Note: tcpdump is not already installed in AWS ec2 instance . We have to install by :
> yum install tcpdump
As we run the tcpdump command in all datanodes we have observed that data is written by client directly into datanodes not on master . client only write data in one datanode then this datanode will copy data into another datanode and so on copying data serially in other datanodes.
→ In our case first client will write data in datanode3:
→ And datanode3 writing data into datanode1 :
→ datanode receiving data from datanode3:
→ datanode1 will write data on datanode2
→ datanode2 will receive data from datanode1
Block Diagram:
Team Details:
- Ganesh Kumar Kansara(Team Leader ) (Namenode)
- Vinay Pasi (Datanode1) (Client)
- Laveena Jethani (Datanode2)
- Tejashwini Kottha (Datanode3)