How to Write Operation Is Executed In Hadoop Distributed File System??

Laveena Jethani

4 min readOct 9, 2020

Description:

Master →

Public IP → 54.56.155.165

Private IP → 172.31.32.16

Datanode1 →

Public IP → 13.232.54.129

Private IP → 172.31.39.85

Datanode2 →

Public IP → 100.26.156.190

Private IP → 172.31.49.174

Datanode3 →

Public IP → 52.66.51.125

Private IP → 172.31.11.231

Client →

Public IP → 13.235.24.36

Private IP → 172.31.35.22

To setup the Hadoop cluster we have to follow some procedure:

Step:1 Setting up master node:

Master is also called as namenode in HDFS. For setting up namenode we have to setup/configure hdfs-site.xml and core-site.xml .

→ In namenode we have to make a central directory of storage . All datanode contribute their storage to namenode directory. So we make a directory named as /nn in namenode .

→ Setup the hdfs-site.xml file in master :

→ Setup core.site.xml file in master :

In core-site.xml file we have given IP 0.0.0.0 so that everyone can connect but for making private cluster we have given certain firewall rules in AWS master node instance .

→ Firewall rule allow in master node for some datanodes and client node.

→ Formatting namenode directory

> hadoop namenode -format

→ Start the namenode

Step :2 Setting up slave node:

Slave nodes also called as datanode . For setting up hadoop cluster we set three datanodes .

We make directory in all data nodes. So that data node contribute their storage.

datanode 1 → /dn1
datanode 2 → /dn2
datanode 3 → /dn3

Now we have to configure hdfs-site.xml and core-site.xml in all datanodes :

hdfs-site.xml file in datanode 1 →

hdfs-site.xml file in datanode 2 →

hdfs-site.xml file in datanode 3→

core-site.xml setup in all datanodes :

starting datanode by command:

> hadoop-daemon.sh start datanode

Step 3: Check status of cluster

Step: 4 Setup client node

Now client only have to setup core-site.xml .

Now client will upload file:

Step:5 All datanode run tcpdump command . It will help to see the output of how file writing perform .

Note: tcpdump is not already installed in AWS ec2 instance . We have to install by :

> yum install tcpdump

As we run the tcpdump command in all datanodes we have observed that data is written by client directly into datanodes not on master . client only write data in one datanode then this datanode will copy data into another datanode and so on copying data serially in other datanodes.

→ In our case first client will write data in datanode3: