Hadoop & HBase

Hadoop

Hadoop HDFS

Agent uses Hadoop's own JMX to Json interface, to get statistics from HDFS NameNode and DataNodes. All Hadoop Ecosystem checks use the same config file: hadoop.ini.

HDFS NameNode

NameNode

Install

cd ${OE_AGENT_HOME}/checks_enabled
ln -s ../checks_available/check_hadoop_namenode.py ./

In real life cluster installation HDFS NameNode doesn't use loopback interface, so make sure that you put right IP of NameNode in config file.

Configure

[Hadoop-NameNode]
jmx: http://${NAMENODE_IP}:50070/jmx

Restart

${OE_AGENT_HOME}/oddeye.sh restart

Provides

Name Description Type Unit
namenode_addblockops Adblock operations per second rate OPS
namenode_blockcapacity HDFS Block Capacity gauge Integer
namenode_blockstotal HDFS Total blocks gauge Integer
namenode_capacityremaining HDFS remaining free space gauge Bytes
namenode_capacitytotal HDFS Total capacity gauge Bytes
namenode_capacityused HDFS Used space gauge Bytes
namenode_corruptblocks HDFS corrupt blocks gauge Integer
namenode_createfileops NameNode Create file operations on nameNode rate OPS
namenode_deletefileops NameNode Delete file operations on nameNode rate OPS
namenode_fileinfoops NameNode File information requests rate OPS
namenode_filesdeleted NameNode deleted files rate OPS
namenode_filesrenamed NameNode rename file operations rate OPS
namenode_filestotal Amount of files in HDFS gauge Integer
namenode_getblocklocations NameNode Get Block Location operations rate OPS
namenode_getlistingops NameNode Get LIsting operations rate OPS
namenode_heap_committed Java Heap memory committed gauge Bytes
namenode_heap_init Java Heap memory init gauge Bytes
namenode_heap_max Java Heap memory max gauge Bytes
namenode_heap_used Java Heap memory used gauge Bytes
namenode_lastgc_duration Last garbage collections duration gauge Milliseconds
namenode_missingblocks NameNode missing blocks gauge Integer
namenode_nondfsusedspace Non HDFS disk space usage gauge Bytes
namenode_nonheap_committed Java Non Heap memory committed gauge Bytes
namenode_nonheap_init Java Non Heap memory init gauge Bytes
namenode_nonheap_max Java Non Heap memory max gauge Bytes
namenode_nonheap_used Java Non Heap memory used gauge Bytes
namenode_numdeaddatanodes Number of dead DataNodes in cluster gauge Integer
namenode_numdecomdeaddatanodes Number of decommissioned dead DataNodes in cluster gauge Integer
namenode_numdecomlivedatanodes Number of decommissioned live DataNodes in cluster gauge Integer
namenode_numdecommissioningdatanodes Number of decommissioning DataNodes in cluster gauge Integer
namenode_numlivedatanodes Number of live DataNodes in cluster gauge Integer
namenode_numstaledatanodes Number of stale DataNodes in cluster gauge Integer
namenode_numstalestorages Number of staled Storages in cluster gauge Integer
namenode_openfiledescriptorcount NaeNode process open files descriptors count gauge Integer
namenode_pendingreplicationblocks Amount of pending for replication blocks in HDFS gauge Integer
namenode_percentremaining HDFS Storage space remaining gauge Percent
namenode_receivedbytes Namenode received bytes gauge Bytes
namenode_scheduledreplicationblocks Amount of blocks scheduled for replication gauge Integer
namenode_sentbytes NameNode sent bytes gauge Bytes
namenode_transactionsnumops NameNode transactions count gauge Integer
namenode_underreplicatedblocks HDFS under replicated blocks count gauge Integer

HDFS DataNode

DataNode configuration would be the very appropriate for the most installations and does not require any changes.

Install

cd ${OE_AGENT_HOME}/checks_enabled
ln -s ../checks_available/check_hadoop_datanode.py ./

Configure

Usually HDFS DataNode binds on 0.0.0.0:50075, so no extra configuration is needed.

If you have specific case, please make sure to change 127.0.0.1 to IP address matching you DataNode bind address.

[Hadoop-Datanode]
jmx: http://127.0.0.1:50075/jmx

Restart

${OE_AGENT_HOME}/oddeye.sh restart

Provides

Name Description Type Unit
datanode_bytesread DataNode read bytes per second rate Bytes
datanode_byteswritten DataNode write bytes per second rate Bytes
datanode_capacity Disk space on current DataNode gauge Bytes
datanode_dfsused Current DataNode’s used disk space gauge Bytes
datanode_du_percent Current DataNodes disk usage in percents gauge Percent
datanode_heap_committed DataNode JVM heap committed gauge Bytes
datanode_heap_init DataNode JVM heap init gauge Bytes
datanode_heap_max DataNode JVM Heap max gauge Bytes
datanode_heap_used DataNode JVM heap used gauge Bytes
datanode_lastgc_duration Duration of last garbage collection gauge Milliseconds
datanode_nonheap_committed DataNode JVM non heap committed gauge Bytes
datanode_nonheap_init DataNode JVM non Heap init gauge Bytes
datanode_nonheap_max DataNode JVM non heap max gauge Bytes
datanode_nonheap_used Datanode JVM non Heap used gauge Bytes
datanode_openfiles Datanode daemon’s open files descriptors count gauge Integer
datanode_space_remaining Disk space remaining on current DataNode gauge Bytes
datanode_totalreadtime Read operations time on current DataNode rate Milliseconds
datanode_totalwritetime Write operations time on current DataNode rate Milliseconds

HBase Master

HMaster

Install

cd ${OE_AGENT_HOME}/checks_enabled
ln -s ../checks_available/check_hbase_master.py ./

Configure

Usually HBase Master binds on 0.0.0.0:60010, so no extra configuration is needed.

If you have specific case, please make sure to change 127.0.0.1 to IP address matching you HBase Master's bind address.

[HBase-Master]
jmx: http://127.0.0.1:60010/jmx

Restart

${OE_AGENT_HOME}/oddeye.sh restart

Provides

Name Description Type Unit
hmaster_exceptions General Exceptions counter Integer
hmaster_exceptions_failedsanitycheck Failed Sanity Check Exception counter Integer
hmaster_exceptions_multiresponsetoolarge Multi Response Too Large Exception counter Integer
hmaster_exceptions_outoforderscannernext Out Of Order Scanner Next Exception counter Integer
hmaster_exceptions_regionmoved Region Moved Exception counter Integer
hmaster_exceptions_regiontoobusy Region Too Busy Exception counter Integer
hmaster_exceptions_unknownscanner Unknown Scanner Exception counter Integer
hmaster_heap_committed Hbase Master JVM heap committed gauge Byte
hmaster_heap_init Hbase Master JVM heap init gauge Byte
hmaster_heap_max Hbase Master JVM heap max gauge Byte
hmaster_heap_used Hbase Master JVM heap used gauge Byte
hmaster_heap_{parnew/g1_young}_lastgcinfo G1 Youg or ParNew last Garbage collections time gauge Milliseconds
hmaster_heap_{cms/g1_old}_lastgcInfo G1 Old or CMS last last Garbage collection time gauge Milliseconds
hmaster_node_averageload HBase cluster's load average gauge Integer
hmaster_node_clusterrequests Hbase cluster wide requests per second rate OPS
hmaster_node_gccount Hbase Master garbace collections count counter Integer
hmaster_node_gctimemillis Hbase Master garbage collections pause time gauge Milliseconds
hmaster_node_numdeadregionservers Number of dead Region Servers gauge Integer
hmaster_node_numregionservers Number of Region Servers gauge Integer
hmaster_node_ritcount The number of regions in transition gauge Integer
hmaster_node_ritcountoverthreshold The number of regions that have been in transition longer than a threshold time gauge Integer
hmaster_node_ritoldestage The age of the longest region in transition in milliseconds gauge
hregion_node_memstoresize Size in bytes of Memstore gauge bytes
hregion_node_regioncount Amount of regions running on monitored RegionServer gauge Integer
hregion_node_storefilesize Size in bytes of StoreFile gauge bytes
hregion_node_storefilecount Amount of Store Files on monitored RegionServer gauge Integer
hregion_node_hlogfilecount Amount of HLog Files on monitored RegionServer gauge Integer
hregion_node_hlogfilesize Size in bytes of HLog Files gauge bytes
hregion_node_percentfileslocal HDFS data locality percentage gauge Percent
hregion_node_blockcounthitpercent Percents of Block Cache hits gauge Percent

HBase RegionServer

Install

cd ${OE_AGENT_HOME}/checks_enabled
ln -s ../checks_available/check_hbase_regionserver.py ./

Configure

Usually HBase RegionServer binds on 0.0.0.0:60030, so no extra configuration is needed.

If you have specific case, please make sure to change 127.0.0.1 to IP address matching you HBase Master's bind address.

[HBase-Region]
jmx: http://127.0.0.1:60030/jmx

Restart

${OE_AGENT_HOME}/oddeye.sh restart

Provides

Name Description Type Unit
hregion_heap_cms_lastgcinfo Duration of previous CMS garbage collection gauge Milliseconds
hregion_heap_g1_old_lastgcinfo Duration of previous G1 old generation garbage collection gauge Milliseconds
hregion_heap_parnew_lastgcinfo Duration of previous ParNew garbage collection gauge Milliseconds
hregion_heap_g1_young_lastgcinfo Duration of previous G1 young generation garbage collection gauge Milliseconds
hregion_heap_committed HBase RegionServer JVM heap committed gauge Bytes
hregion_heap_init HBase RegionServer JVM heap init gauge Bytes
hregion_heap_max HBase RegionServer JVM heap max gauge Bytes
hregion_heap_used HBase RegionServer JVM heap used gauge Bytes
hregion_node_blockcachecount Block cache items count gauge Integer
hregion_node_blockcacheevictioncount Block cache evictions count gauge Integer
hregion_node_blockcacheexpresshitpercent Block cache express hits count rate OPS
hregion_node_blockcachefreesize Block cache free size in bytes gauge Bytes
hregion_node_blockcachehitcount Block cache hits count per second rate OPS
hregion_node_blockcachemisscount Block cache misses count per second rate OPS
hregion_node_blockcachesize Block cache size in bytes gauge Bytes
hregion_node_blockcounthitpercent Block cache hits percent gauge Percent
hregion_node_compactedcellscount Minor compacted cells count per second rate OPS
hregion_node_compactedcellssize Minor compacted bytes rate Bytes
hregion_node_majorcompactedcellscount Major compacted cells count per second rate Bytes
hregion_node_majorcompactedcellssize Major compacted bytes rate Bytes
hregion_node_gctimemillis rated affect of last Garbage collection for performed checks interval rate Milliseconds
hregion_node_gccount Completed garbage collections counter integer
hregion_node_openfiledescriptorcount Linux open files descriptors count by RegionServer’s daemon gauge integer
hregion_node_delete_num_ops Number of delete operations performed by current RegionServer per second rate OPS
hregion_node_flushtime_num_ops Number of flush operations performed by current RegionServer per second rate OPS
hregion_node_mutate_num_ops Number of mutate operations performed by current RegionServer per second rate OPS
hregion_node_readrequestcount Number of read operations performed by current RegionServer per second rate OPS
hregion_node_slowappendcount Number of slow append operations performed by current RegionServer per second rate OPS
hregion_node_slowdeletecount Number of slow delete operations performed by current RegionServer per second rate OPS
hregion_node_slowgetcount Number of slow get operations performed by current RegionServer per second rate OPS
hregion_node_slowincrementcount Number of slow increment operations performed by current RegionServer per second rate OPS
hregion_node_slowputcount Number of slow put operations performed by current RegionServer per second rate OPS
hregion_node_totalrequestcount Total requests per second executed on current RegionServer rate OPS
hregion_node_writerequestcount Writes per second executed on current RegionServer rate OPS

Hedged Reads (If enabled)

Name Description Type Unit
hregion_node_hedgedreads Number of started Hedged Read operation counter Integer
hregion_node_hedgedreadwins Number of Hedged Read operation which returned values faster that normal reads counter Integer

HBase Thrift & REST

Install

cd {AGENT_HOME}/checks_enabled/
ln -s ../checks_available/check_hbase_rest.py ./
cd {AGENT_HOME}/checks_enabled/
ln -s ../checks_available/check_hbase_thrift.py ./

Configure

Here is sample config which should be fine for most of HBase installations: Default config parameters are suitable for most of installtions. But if your services are running on different IP address or port, just change parameters below to match your installtion .

[HBase-Thrift]
jmx : http://127.0.0.1:9095/jmx
[HBase-Rest]
jmx : http://127.0.0.1:9095/jmx

HBase REST Provides

Name Description Type Unit
hrest_daemonthreadcount Running Daemon threads count gauge Integer
hrest_faileddelete Failed Deletes per second rate OPS
hrest_failedget Failed Gets per second rate OPS
hrest_failedput Failed Puts per second rate OPS
hrest_failedscancount Failed Scans per second rate OPS
hrest_heap_committed Hbase REST server’s JVM heap committed gauge Bytes
hrest_heap_max Hbase REST server’s JVM heap max gauge Bytes
hrest_heap_used Hbase REST server’s JVM heap used gauge Bytes
hrest_nonheap_committed Hbase REST server’s JVM non heap committed gauge Bytes
hrest_nonheap_max Hbase REST server’s JVM npn heap max gauge Bytes
hrest_nonheap_used Hbase REST server’s JVM non heap used gauge Bytes
hrest_pausetimewithgc_90th_percentile Garbage collectors pause time 90th percentile gauge Milliseconds
hrest_pausetimewithgc_99th_percentile Garbage collectors pause time 99th percentile gauge Milliseconds
hrest_pausetimewithoutgc_90th_percentile Garbage collectors without pause time 90th percentile gauge Milliseconds
hrest_pausetimewithoutgc_99th_percentile Garbage collectors without pause time 99th percentile gauge Milliseconds
hrest_peakthreadcount Peak running Daemon threads count gauge Integer
hrest_requests Total requests count executed on this REST gateway counter Integer
hrest_successfuldelete Successful delete requests per second executed on this REST gateway counter Integer
hrest_successfulget Successful get requests count executed on this REST gateway counter Integer
hrest_successfulput Successful Put requests count executed on this REST gateway counter Integer
hrest_successfulscancount Successful Scan requests count executed on this REST gateway counter Integer
hrest_threadcount REST gateway’s running threads count gauge Integer
hrest_totalstartedthreadcount REST gateway’s total threads count counter Integer

HBase Thrift Provides

Name Description Type Unit
hthrift_batchget Batch gets per second rate OPS
hthrift_batchmutate Batch mutates per second rate OPS
hthrift_callqueuelen Length of Thrift queue gauge Integer
hthrift_(cms,g1_old)_lastgcinfo Duration of last CMS, G1 Old gen garbage collection current Milliseconds
hthrift_daemonthreadcount Running Daemon threads count gauge Integer
hthrift_heap_committed JVM heap committed gauge Bytes
hthrift_heap_init JVM heap Init gauge Bytes
hthrift_heap_max JVM heap Max gauge Bytes
hthrift_heap_used JVM heap Used gauge Bytes
hthrift_parnew_lastgcinfo Duration of last ParNew, G1 Young gen garbage collection current Milliseconds
hthrift_pausetimewithgc_90th_percentile 90th percentile of GC pause time with GC current Milliseconds
hthrift_pausetimewithgc_99th_percentile 99th percentile of GC pause time with GC current Milliseconds
hthrift_pausetimewithoutgc_90th_percentile 90th percentile of GC pause time without GC current Milliseconds
hthrift_pausetimewithoutgc_99th_percentile 99th percentile of GC pause time without GC current Milliseconds
hthrift_peakthreadcount Thrift gateway’s peak running threads count gauge Integer
hthrift_slowthriftcall Slow calls rate OPS
hthrift_threadcount Thrift gateway’s running threads count gauge Integer
hthrift_thriftcall Amount of thrift calls rate OPS
hthrift_timeinqueue_num_ops Amount of time object were in Thrift queue rate OPS
hthrift_totalstartedthreadcount Thrift gateway’s total threads count gauge Integer