Created_at :Sep 2011
Performance testing / Benchmarking a Hbase cluster
So you have setup a new Hbase cluster, and want to 'take it for a spin'. Here is how, without writing a lot of code on your own.
I like to have hbase command available in my PATH. I put the following in my ~/.bashrc file:
export HBASE_HOME=/hadoop/hbase
export PATH=$PATH:$HBASE_HOME/bin
class : org.apache.hadoop.hbase.PerformanceEvaluation
jar : hbase-*-tests.jar
This is a handy class that comes with the distribution. It can do read/writes to hbase. It spawns a map-reduce job to do the reads / writes in parallel. There is also an option to do the operations in threads instead of map-reduce.
lets find out the usage:
# hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
[--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>
....
[snipped]
...
So lets run a randomWrite test:
# time hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 5
Once this test is complete, it will print out summaries:
... <output clipped>
....
Hbase Performance Evaluation
Row count=5242850
Elapsed Time in millisconds = 1789049
.....
real 3m21.829s
user 0m2.944s
sys 0m0.232s
I actually liked to look at elapsed REAL time (that I measure using unix 'time' command). Then do this calculation:
5 million rows = 5242850
total time = 3m 21 sec = 201secs
= 5 GB data / 201 seconds = 5 * 1000 M bytes / 201 sec = 24.87 MB / sec
insert time = 201 seconds / 5242850 rows = 0.038 ms / row
This should give you a good idea of the cluster throughput.
Now, lets do a READ benchmark
# time hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 5
and you can calculate read throughput
YCSB is a performance testing tool released by Yahoo. It has a HBase mode that we will use:
First, read an exellent tutorial by George Lars on using YCSB with Hbase.
And follow his instructions setting up hbase and YCSB. ( I won't repeat it here)
YCSB ships with a few 'work loads'. I am going to run 'workloada' - it is a mix of read and write (50% / 50%)
step 1) setting up work load:
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=10000000 -threads 10 -s > load.dat
examine the file 'load.dat'. Here are the first few lines:
YCSB Client 0.1
Command line: -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=10000000 -threads 10 -s
[OVERALL], RunTime(ms), 786364.0
[OVERALL], Throughput(ops/sec), 12716.757125199018
[INSERT], Operations, 10000000
[INSERT], AverageLatency(ms), 0.5551727
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 34580
[INSERT], 95thPercentileLatency(ms), 0
[INSERT], 99thPercentileLatency(ms), 1
[INSERT], Return=0, 10000000
[INSERT], 0, 9897989
[INSERT], 1, 99298
I have highlighted the important numbers in bold. One interesting stat is how many ops were performed each second. Also you can see the runtime in ms (~786 secs)
Step 2) running the workload
The previous step setup the workload. Now lets run it.
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=cf -p operationcount=1000000 -s -threads 10 > a.dat
Differences are:
YCSB Client 0.1
Command line: -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p operationcount=10000000 -threads 10 -s
[OVERALL], RunTime(ms), 2060800.0
[OVERALL], Throughput(ops/sec), 4852.484472049689
[UPDATE], Operations, 5002015
[UPDATE], AverageLatency(ms), 0.6575520065413638
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 28364
[UPDATE], 95thPercentileLatency(ms), 0
[UPDATE], 99thPercentileLatency(ms), 0
[UPDATE], Return=0, 5002015
[UPDATE], 0, 4986514
[UPDATE], 1, 15075
[UPDATE], 2, 0
[UPDATE], 3, 2
....
....[snip]
....
[READ], Operations, 4997985
[READ], AverageLatency(ms), 3.3133978993534394
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 2868
[READ], 95thPercentileLatency(ms), 13
[READ], 99thPercentileLatency(ms), 24
[READ], Return=0, 4997985
[READ], 0, 333453
[READ], 1, 1866771
[READ], 2, 1197919
Here is how to read it:
So in this tutorial I have demonstrated some quick ways of running some performance evaluations on an Hbase cluster.
** Comment on this article **
Performance testing / Benchmarking a Hbase cluster
So you have setup a new Hbase cluster, and want to 'take it for a spin'. Here is how, without writing a lot of code on your own.Before We Start
I like to have hbase command available in my PATH. I put the following in my ~/.bashrc file:
export HBASE_HOME=/hadoop/hbase
export PATH=$PATH:$HBASE_HOME/bin
A) hbase PerformanceEvaluation
class : org.apache.hadoop.hbase.PerformanceEvaluation
jar : hbase-*-tests.jar
This is a handy class that comes with the distribution. It can do read/writes to hbase. It spawns a map-reduce job to do the reads / writes in parallel. There is also an option to do the operations in threads instead of map-reduce.
lets find out the usage:
# hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
[--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>
....
[snipped]
...
So lets run a randomWrite test:
# time hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 5
- we are running 5 clients. By default, this would be running in map reduce mode
- each client is inserting 1 million rows (default), about 1GB size (1000 bytes per row). So total data size is 5 GB (5 x 1)
- typically there will be 10 maps per client. So we will see 50 (5 x 10) map tasks
Once this test is complete, it will print out summaries:
... <output clipped>
....
Hbase Performance Evaluation
Row count=5242850
Elapsed Time in millisconds = 1789049
.....
real 3m21.829s
user 0m2.944s
sys 0m0.232s
I actually liked to look at elapsed REAL time (that I measure using unix 'time' command). Then do this calculation:
5 million rows = 5242850
total time = 3m 21 sec = 201secs
write throughput
= 5242850 rows / 201 seconds = 26083.8 rows / sec= 5 GB data / 201 seconds = 5 * 1000 M bytes / 201 sec = 24.87 MB / sec
insert time = 201 seconds / 5242850 rows = 0.038 ms / row
This should give you a good idea of the cluster throughput.
Now, lets do a READ benchmark
# time hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 5
and you can calculate read throughput
B) YCSB
YCSB is a performance testing tool released by Yahoo. It has a HBase mode that we will use:
First, read an exellent tutorial by George Lars on using YCSB with Hbase.
And follow his instructions setting up hbase and YCSB. ( I won't repeat it here)
YCSB ships with a few 'work loads'. I am going to run 'workloada' - it is a mix of read and write (50% / 50%)
step 1) setting up work load:
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=10000000 -threads 10 -s > load.dat
- -load : we are loading the data
- -P workloads/workloada : we are using workloada
- -p recordcount=100000000 : 10 million rows
- -threads 10 : use 10 threads to parallelize inserts
- -s : print progress on stederr (console) every 10 secs
- > load.dat : save the data into this file
examine the file 'load.dat'. Here are the first few lines:
YCSB Client 0.1
Command line: -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=10000000 -threads 10 -s
[OVERALL], RunTime(ms), 786364.0
[OVERALL], Throughput(ops/sec), 12716.757125199018
[INSERT], Operations, 10000000
[INSERT], AverageLatency(ms), 0.5551727
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 34580
[INSERT], 95thPercentileLatency(ms), 0
[INSERT], 99thPercentileLatency(ms), 1
[INSERT], Return=0, 10000000
[INSERT], 0, 9897989
[INSERT], 1, 99298
I have highlighted the important numbers in bold. One interesting stat is how many ops were performed each second. Also you can see the runtime in ms (~786 secs)
Step 2) running the workload
The previous step setup the workload. Now lets run it.
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=cf -p operationcount=1000000 -s -threads 10 > a.dat
Differences are:
- -t : for transaction mode (read/write)
- operationcount : specifies how many ops to try
YCSB Client 0.1
Command line: -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p operationcount=10000000 -threads 10 -s
[OVERALL], RunTime(ms), 2060800.0
[OVERALL], Throughput(ops/sec), 4852.484472049689
[UPDATE], Operations, 5002015
[UPDATE], AverageLatency(ms), 0.6575520065413638
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 28364
[UPDATE], 95thPercentileLatency(ms), 0
[UPDATE], 99thPercentileLatency(ms), 0
[UPDATE], Return=0, 5002015
[UPDATE], 0, 4986514
[UPDATE], 1, 15075
[UPDATE], 2, 0
[UPDATE], 3, 2
....
....[snip]
....
[READ], Operations, 4997985
[READ], AverageLatency(ms), 3.3133978993534394
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 2868
[READ], 95thPercentileLatency(ms), 13
[READ], 99thPercentileLatency(ms), 24
[READ], Return=0, 4997985
[READ], 0, 333453
[READ], 1, 1866771
[READ], 2, 1197919
Here is how to read it:
- overall details are printed on top
- then UPDATE stats are shown
- And lots lines of percentiles for UPDATE follows
- scroll down more (or search for READ), to find READ stats
- we can see the avg latency is 3.13 ms
- The percentiles are interesting too. We can satisfy 95% requests in 13 ms. Pretty good. Almost as fast as a RDBMS
So in this tutorial I have demonstrated some quick ways of running some performance evaluations on an Hbase cluster.
** Comment on this article **