Performance testing a HBase cluster

12 Sep / 2011

Performance testing a HBase cluster

Sujee Maniyam hbase 6

So you have setup a new Hbase cluster, and want to ‘take it for a spin’. Here is how, without writing a lot of code on your own.

A) hbase PerformanceEvaluation

class : org.apache.hadoop.hbase.PerformanceEvaluation
jar : hbase-*-tests.jar

This is a handy class that comes with the distribution. It can do read/writes to hbase. It spawns a map-reduce job to do the reads / writes in parallel. There is also an option to do the operations in threads instead of map-reduce.

lets find out the usage:

# hbase org.apache.hadoop.hbase.PerformanceEvaluation

Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
[–miniCluster] [–nomapred] [–rows=ROWS] <command> <nclients>
….
[snipped]
…

So lets run a randomWrite test:

# time hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 5

we are running 5 clients. By default, this would be running in map reduce mode
each client is inserting 1 million rows (default), about 1GB size (1000 bytes per row). So total data size is 5 GB (5 x 1)
typically there will be 10 maps per client. So we will see 50 (5 x 10) map tasks

you can watch the progress on the console and also at task tracker UI (http://task_tracker:50030).

Once this test is complete, it will print out summaries:

… <output clipped>
….
Hbase Performance Evaluation
Row count=5242850
Elapsed Time in millisconds = 1789049
…..

real    3m21.829s
user    0m2.944s
sys     0m0.232s

I actually liked to look at elapsed REAL time (that I measure using unix ‘time’ command). Then do this calculation:

5 million rows = 5242850
total time = 3m 21 sec = 201secs

write throughput= 5242850 rows / 201 seconds = 26083.8 rows / sec
= 5 GB data / 201 seconds = 5 * 1000 M bytes / 201 sec = 24.87 MB / sec
insert time = 201 seconds / 5242850 rows = 0.038 ms / row

This should give you a good idea of the cluster throughput.

Now, lets do a READ benchmark

# time hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 5

and you can calculate read throughput

B) YCSB

YCSB is a performance testing tool released by Yahoo. It has a HBase mode that we will use:

First, read an exellent tutorial by George Lars on using YCSB with Hbase.
And follow his instructions setting up hbase and YCSB. ( I won’t repeat it here)

YCSB ships with a few ‘work loads’. I am going to run ‘workloada’ – it is a mix of read and write (50% / 50%)

step 1) setting up work load:
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=10000000 -threads 10 -s > load.dat

-load : we are loading the data
-P workloads/workloada : we are using workloada
-p recordcount=100000000 : 10 million rows
-threads 10 : use 10 threads to parallelize inserts
-s : print progress on stederr (console) every 10 secs
> load.dat : save the data into this file

examine the file ‘load.dat’. Here are the first few lines:

YCSB Client 0.1
Command line: -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=10000000 -threads 10 -s
[OVERALL], RunTime(ms), 786364.0
[OVERALL], Throughput(ops/sec), 12716.757125199018
[INSERT], Operations, 10000000
[INSERT], AverageLatency(ms), 0.5551727
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 34580
[INSERT], 95thPercentileLatency(ms), 0
[INSERT], 99thPercentileLatency(ms), 1
[INSERT], Return=0, 10000000
[INSERT], 0, 9897989
[INSERT], 1, 99298

I have highlighted the important numbers in bold. One interesting stat is how many ops were performed each second. Also you can see the runtime in ms (~786 secs)

Step 2) running the workload
The previous step setup the workload. Now lets run it.

java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=cf -p operationcount=1000000 -s -threads 10 > a.dat

Differences are:

-t : for transaction mode (read/write)
operationcount : specifies how many ops to try

now lets examine a.dat:

YCSB Client 0.1
Command line: -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p operationcount=10000000 -threads 10 -s
[OVERALL], RunTime(ms), 2060800.0
[OVERALL], Throughput(ops/sec), 4852.484472049689
[UPDATE], Operations, 5002015
[UPDATE], AverageLatency(ms), 0.6575520065413638
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 28364
[UPDATE], 95thPercentileLatency(ms), 0
[UPDATE], 99thPercentileLatency(ms), 0
[UPDATE], Return=0, 5002015
[UPDATE], 0, 4986514
[UPDATE], 1, 15075
[UPDATE], 2, 0
[UPDATE], 3, 2
….
….[snip]
….
[READ], Operations, 4997985
[READ], AverageLatency(ms), 3.3133978993534394
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 2868
[READ], 95thPercentileLatency(ms), 13
[READ], 99thPercentileLatency(ms), 24
[READ], Return=0, 4997985
[READ], 0, 333453
[READ], 1, 1866771
[READ], 2, 1197919

Here is how to read it:

overall details are printed on top
then UPDATE stats are shown
And lots lines of percentiles for UPDATE follows
scroll down more (or search for READ), to find READ stats
- we can see the avg latency is 3.13 ms
- The percentiles are interesting too. We can satisfy 95% requests in 13 ms. Pretty good. Almost as fast as a RDBMS

So in this tutorial I have demonstrated some quick ways of running some performance evaluations on an Hbase cluster.

Sujee is a founder, principal at Elephant Scale where he provides consulting and training on Big Data technologies

6 Comments:

By vamshi 18 Sep 2011

Hi, this post is very good.But i have one basic doubt, all the inserted data by clients go to which place in hbase?? How can we see that inserted data in the hbase? One more thing is , during its operation of random read/random write, the namenode,tasktracker, Jobtracker web UI is not showing anything..But i can see the results and working on the console.
Can you help me how to check the above mentioned things?
Thank you
- By Sujee Maniyam 20 Sep 2011
  
  1) Using hbase shell you can see the data in a table.
  on terminal, bring up hbase shell
  # hbase shell
  > scan ‘table’
  this will print out all entries in the table.
  
  If you just want to see a few, then use LIMIT
  > scan ‘table’ , {LIMIT => 10}
  see first 10 rows
  
  2) To monitor activity, use Hbase UI (http://hmaster:60010)
  you will see requests for each region server
By minsbrz 09 May 2012

Hi.It is very great performance to me.
Can I see the configuration of hadoop & hbase.
We have been testing that tools , but the result of testing is not same.
so we want to see detail option .
please give me advice. thanks for reading.
- By minsbrz 14 May 2012
  
  I have been testing on HBase for performance like you.
  But the result was not same as you , How can i do that .
  My testing result is about 4000 TPS on 4 clusters ,
  memory write only is 10000 TPS,
  could you tell me detail the property of xml and status of writing ,to memory or to Disk(flush,compact act so on..) ?
  Thanks in advance for advice.
By bonsonnoise 17 Jul 2015

Hi,
what is the configuration of your cluster ? How many servers ? What is the position of the regionserver, the namenode and the jobtracket (on the same server) ?
I would like to compare my results with yours in the same conditions.
Thank you.
By vmwalla 16 Jun 2016

Sujee,
Thanks for sharing. Is it possible to run this test and point it to a specific Name Space? I am on a shared cluster and in HBASE limited to a name_space that our Admin has created.

12 Sep / 2011

Performance testing a HBase cluster

A) hbase PerformanceEvaluation

B) YCSB

6 Comments:

By vamshi 18 Sep 2011

By Sujee Maniyam 20 Sep 2011

By minsbrz 09 May 2012

By minsbrz 14 May 2012

By bonsonnoise 17 Jul 2015

By vmwalla 16 Jun 2016

Leave a Reply

Categories

Tags