Graceful decommissioning of a DataNode is a process when data on decommissioning node is moved to another live node and affected node is removed from cluster. In order to enable graceful decommissioning, you need to set dfs.hosts.exclude property in  hdfs-site.xml as follows and and restart NameNode.

Configure NameNode


When decommissioning is enabled, you need to create dfs.hosts.exclude file and add names of affected hosts to it (one host per line).

echo > /etc/hadoop/conf/excludes
echo >> /etc/hadoop/conf/excludes
echo >> /etc/hadoop/conf/excludes

When hostnames of nodes are added to excludes file run following command.

hdfs dfsadmin -refreshNodes

Monitor the process

When it's done decommissioning will start and all blocks of afected nde will be marked as  under replicated. You can monitor the process of decommissioning with OddEye or use NameNode UI.

Its also very informative to create chart for cumulative traffic of datanodes like this below and monitor traffic of all nodes .

This chart clearly shows that decommissioning process is started ate 10:38 and not is actively syncing data between nodes.

Be Aware

Decommission is resource intensive operation , you should do it at non peak hours and carefully monitor the process of decommissioning.

Stop Decommissioning

If you see that the Sys-Load of DataNodes is getting higher that servers can handle, or have any other reason to stop the decommissioning process :

Clear  excludes file:

cat /dev/null > /etc/hadoop/conf/excludes

Refresh NameNodes:

hdfs dfsadmin -refreshNodes

If you are running H-A NameNode, you need to keep in sync excludes file on both NameNodes, before running refresh of NameNode

Easiest way is just to copy excludes file after making changed to another NameNode.

scp /etc/hadoop/conf/excludes