Hedged Reads is relatively new feature introduces in Hadoop 2.4
It's simple but quit effective way to speed up heavy and big read requests on HBase . The idea is to read data from another block replica, when reading from main replica takes longer that desired. In this case parallel thread is started on another replica block and two parallel queries are executed against HBase . Data is taken from first response and the outstanding read is cancelled. This feature is not for solving systematic problems, but for situation when some reads occasionally takes a long time. The benefits of this approach are:
- Reads on secondary replica
- Strongly consistent
- Works at HDFS level
By default Hedged Reads are disabled and can be used only when HFiles are stored in HDFS.
Following is example of enabling and setting up Hedged Reads in hbase-site.xml. There are two main option to enable hedged reads :
Both have quite intuitive names. So first settings is for controlling number of threads executed for Hedged Reads, and second is time threshold in milliseconds, after which read is considered for hedged reads.
In this example client will spin up 20 threads to read data from another replica of a block.
<property> <name>dfs.client.hedged.read.threadpool.size</name> <value>20</value> </property>
Following sets 10ms as desired threshold after which read request is considered as eligible for running Hedged Read. .
<property> <name>dfs.client.hedged.read.threshold.millis</name> <value>10</value> </property
Monitoring the Performance of Hedged Reads
Following metrics for monitoring Hedged Reads are emitted by Hadoop at http://$REGIONSERVER:$PORT/jmx.
- hedgedReadOps : The number of hedged reads that have occurred
- hedgeReadOpsWin : The number of times the hedged read returned faster than the original read
Off course I recommend OddEye to monitor this and all other aspects of Hadoop/HBase. Our agent will automatically detect if Hedged Reads are enabled and will create following metrics: