How to Write Operation Is Executed In Hadoop Distributed File System??

Laveena Jethani
4 min readOct 9, 2020

--

HDFS

Description:

Master →

Public IP → 54.56.155.165

Private IP → 172.31.32.16

Datanode1 →

Public IP → 13.232.54.129

Private IP → 172.31.39.85

Datanode2 →

Public IP → 100.26.156.190

Private IP → 172.31.49.174

Datanode3 →

Public IP → 52.66.51.125

Private IP → 172.31.11.231

Client →

Public IP → 13.235.24.36

Private IP → 172.31.35.22

To setup the Hadoop cluster we have to follow some procedure:

Step:1 Setting up master node:

Master is also called as namenode in HDFS. For setting up namenode we have to setup/configure hdfs-site.xml and core-site.xml .

→ In namenode we have to make a central directory of storage . All datanode contribute their storage to namenode directory. So we make a directory named as /nn in namenode .

→ Setup the hdfs-site.xml file in master :

Master hdfs-site.xml

→ Setup core.site.xml file in master :

Master core-site.xml

In core-site.xml file we have given IP 0.0.0.0 so that everyone can connect but for making private cluster we have given certain firewall rules in AWS master node instance .

→ Firewall rule allow in master node for some datanodes and client node.

Firewall rule in master node

→ Formatting namenode directory

> hadoop namenode -format

→ Start the namenode

Step :2 Setting up slave node:

Slave nodes also called as datanode . For setting up hadoop cluster we set three datanodes .

We make directory in all data nodes. So that data node contribute their storage.

  • datanode 1 → /dn1
  • datanode 2 → /dn2
  • datanode 3 → /dn3

Now we have to configure hdfs-site.xml and core-site.xml in all datanodes :

hdfs-site.xml file in datanode 1 →

hdfs-site.xml file in datanode 2 →

hdfs-site.xml file in datanode 3→

core-site.xml setup in all datanodes :

starting datanode by command:

> hadoop-daemon.sh start datanode

Step 3: Check status of cluster

Step: 4 Setup client node

Now client only have to setup core-site.xml .

Client core-site.xml

Now client will upload file:

client file upload

Step:5 All datanode run tcpdump command . It will help to see the output of how file writing perform .

Note: tcpdump is not already installed in AWS ec2 instance . We have to install by :

> yum install tcpdump

As we run the tcpdump command in all datanodes we have observed that data is written by client directly into datanodes not on master . client only write data in one datanode then this datanode will copy data into another datanode and so on copying data serially in other datanodes.

→ In our case first client will write data in datanode3:

client writting data ondatanode 3

→ And datanode3 writing data into datanode1 :

datanode3 copying data to datanode1

→ datanode receiving data from datanode3:

datanode1 receiving data from datanode3

→ datanode1 will write data on datanode2

datanode1 copying data to datanode 2

→ datanode2 will receive data from datanode1

datanode2 receiving data from datanode1

Block Diagram:

Team Details:

  1. Ganesh Kumar Kansara(Team Leader ) (Namenode)
  2. Vinay Pasi (Datanode1) (Client)
  3. Laveena Jethani (Datanode2)
  4. Tejashwini Kottha (Datanode3)

--

--

Laveena Jethani
Laveena Jethani

Written by Laveena Jethani

Technical Blog Writer | Research & Review different technologies | ARTH learner

No responses yet