Excellent post indeed, thank you for taking the time! Also please consider linking it into the HBase Wiki, so that other can find it easily (if you have not done so already) – simply register and edit the appropriate page. Much appreciated!
Hey Sujee!! Really nice and helpful post, thanks a lot!!! Though I have a little doubt, I am still learning Hadoop MapReduce and I was wondering why I can’t see the jobs I am executing on my localhost:50030?? I tried to executed from eclipse using the cluster I set up, but I got an error that said
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
Hi I tried this: http://wiki.apache.org/hadoop/Hbase/MapReduce
with no success. Though I am able to execute my job regularly from the command line, and as a java program, any ideas or suggestions are highly appreciated. Thanks in advanced.
@Renato,
the end goal is to be able to run in command line using
hadoop jar your.jar your.mr.class.name
so it will run on all cluster machines.
So if you can do this, that is pretty good
Not sure why you aren’t able to run it within Eclipse…
Hi,
I really like this post and has helped me in my project..thanks for the same..
I am now motivated to make a HBase admin UI interface…
do ping me if u want to help/get involved..
thanks once again
thank for your tutorial. That is the only one that I found that does what I need: read from the HBase in MR and write back in MR! It works and uses the latest API. What more can a person want?
@Mark,
the row key = random_user_id + incrementing_counter
so it will be unique.
timestamps are automatically created by Hbase for versioning cell data. I think it is milli-second accuracy for now. And it is possible you get a few entries with the same TS. But it is okay in most cases (in this case too)
Really nice post, thanks a lot !!! Its really helpful to understand job creation and configuration.But i m getting some exception while executing through eclipse.I m using HBase 0.20.6 APIs. HDFS is in a cluster(multi machines)
Exception in thread “main” java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:478)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:476)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:464)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
at com.exilant.hbase.FreqCounter1.main(FreqCounter1.java:81)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:399)
… 8 more
@swagat,
I am guessing Eclipse can not find some Hadoop / Hbase jars. Try running the program as a Map-Reduce job (use hadoop command to submit) from command line. That might fix the issue
@Amarjeet
not sure what do you mean by using-memory data as Mapreduce input. are you talking about doing ‘data joins’ ?
If that is the case, then you might try this:
- load your data xref data into a memory cache like memcached (this can be done in the driver code of mapreduce)
- in your mapper, access memcached to xref
I Mean:
lets take this example in your blog -
you are reading input from Hbase table – ‘access_logs’ and then process it and put your output in ‘summary’ table.
What I want is:
I wish I could take input from some Map or list (present in-memory only or create myself in code).
I am getting below error. I saw in most of the places in blogs that it is resolved. But no clear details on how it got solved. Any help on this is really appreciated.
Below is the error on running hbase -mapreduce program
12/02/20 02:19:22 INFO mapred.JobClient: Task Id : attempt_201202182211_0032_m_000000_2, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864)
at org.apache.hadoop.mapreduce.JobContext.getCombinerClass(JobContext.java:207)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1307)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:980)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
i m also having almost same error
java.lang.runtimeexception : java.lang.classnotfoundexception :hbase_mapred1.FreqCounter$mapper1
at org.apache.hadoop.conf.configuration.getclass……….
i have also checked my jar file using
jar tf freqcounter.jar
it contains hbase_mapred1/FreqCounter$mapper1.class
plz help me….
i have also set my hadoop-env.sh in hadoop/conf
plz help…….
n one more thing
when i see hbase at http://localhost:60010/master.jsp i only see one regionserver i.e master
when i start hbase without stopping it, the regionservers on slaves again start whereas om master it says regionserver is running a process XXXX stop it first….
do you have any idea why it is so??
plz help….!!!
thanx in advance
hi
i have created a table in hbase with 12 columns in each row and each column has 8 qualifiers.when i try to read complete row it returns correct value for 1:1 in row 1 but returns null for 1:2
it reads all the columns correctly from 2 to 10….
plz help how to solve this problem
i m using this code for reading….it is inside for loop thar runs from 1 to 10..
Can you please how explain you created a composite row key.
I have been trying to create an HBase Table with a composite row key that has a number and a timestamp. The Put Constructor you have used takes in two byte values thereby converting the timestamp to a byte value as well. While I am able to do this, I’m unable to perform any operation on the timestamp or the number individually as the entire row key is inserted as one entity, Could you please guide further on how a partial key scan may be performed.
Thanks in advance.
Hey
Can you help me out a lil more on the map function. I understood the basics of mapreduce but can’t get how to utilize it when you are doing more complex processing.
How do we actually access the individual columns in a map function? I know that a map returns a row. how do we break it down so that we can compare values of two or more columns which maybe of different column families?
For example, a User has a visitor and I want to know if that visitor is one of his friends?
Please help.Just started learning hbase.Having a problem is visualizing how the data appears in a map reduce.
I want some suggestions like,
1) how to display output of reduce functions into JTable.
2) I have a jar file, after execute that jar file a text box will display. how the data which we entered in text box will get output for that data.
Excellent tutorial Sujee!
With the filter, someone should also use scan.setCaching so speed up the job.
J-D
Hi Sujee!
Excellent post indeed, thank you for taking the time! Also please consider linking it into the HBase Wiki, so that other can find it easily (if you have not done so already) – simply register and edit the appropriate page. Much appreciated!
Lars
Hey Sujee!! Really nice and helpful post, thanks a lot!!! Though I have a little doubt, I am still learning Hadoop MapReduce and I was wondering why I can’t see the jobs I am executing on my localhost:50030?? I tried to executed from eclipse using the cluster I set up, but I got an error that said
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
any idea??? Thanks a lot in advance (=
Renato
@Renato
Your hadoop installation doesn’t know about Hbase classes.
Check ‘step 0′ under ‘Hbase setup’ section. Or you can follow this:
http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/mapreduce/package-summary.html
Once you modify your file, make sure to DISTRIBUTE the hadoop-env.sh across the cluster.
Hi I tried this:
http://wiki.apache.org/hadoop/Hbase/MapReduce
with no success. Though I am able to execute my job regularly from the command line, and as a java program, any ideas or suggestions are highly appreciated. Thanks in advanced.
Renato M.
@Renato,
the end goal is to be able to run in command line using
hadoop jar your.jar your.mr.class.name
so it will run on all cluster machines.
So if you can do this, that is pretty good
Not sure why you aren’t able to run it within Eclipse…
Hi,
I really like this post and has helped me in my project..thanks for the same..
I am now motivated to make a HBase admin UI interface…
do ping me if u want to help/get involved..
thanks once again
Worked perfectly. This tutorial was so helpful. Thank you so much.
Even the official Apache tutorial in the Hadoop docs use the deprecated APIs, it’s nice to see an up to date tutorial like this one.
Hi, Sujee,
thank for your tutorial. That is the only one that I found that does what I need: read from the HBase in MR and write back in MR! It works and uses the latest API. What more can a person want?
Cheers,
Mark
By the way, your timestamp is not unique, because a few records are created for the same time stamp – or my computer is too fast
, but you get this:
\x00\x00\x00\x02\x00 column=details:page, timestamp=1296845269865, value=/a.htm
\x00\x07i l
\x00\x00\x00\x02\x00 column=details:page, timestamp=1296845269865, value=/a.htm
\x00\x07j l
@Mark,
the row key = random_user_id + incrementing_counter
so it will be unique.
timestamps are automatically created by Hbase for versioning cell data. I think it is milli-second accuracy for now. And it is possible you get a few entries with the same TS. But it is okay in most cases (in this case too)
glad you found the article useful
Thanks for great tutorials !
Dear Sujee,
Thanks for your great tutorial. One may need to download google-collect-1.0-rc1.jar and jsr305.jar
I am glad that I found your tutorial too.
Hi Sujee,
Really nice post, thanks a lot !!! Its really helpful to understand job creation and configuration.But i m getting some exception while executing through eclipse.I m using HBase 0.20.6 APIs. HDFS is in a cluster(multi machines)
Exception in thread “main” java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:478)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:476)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:464)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
at com.exilant.hbase.FreqCounter1.main(FreqCounter1.java:81)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:399)
… 8 more
any suggestion will be a great help for me.
Thanks
Swagat
@swagat,
I am guessing Eclipse can not find some Hadoop / Hbase jars. Try running the program as a Map-Reduce job (use hadoop command to submit) from command line. That might fix the issue
Can you please guide me how to use HBase in Java programs, I tried running your example but it gives this error
11/05/02 17:44:40 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 0 time(s).
11/05/02 17:44:42 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 1 time(s).
11/05/02 17:44:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 2 time(s).
11/05/02 17:44:46 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 3 time(s).
11/05/02 17:44:48 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 4 time(s).
11/05/02 17:44:50 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 5 time(s).
11/05/02 17:44:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 6 time(s).
11/05/02 17:44:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 7 time(s).
11/05/02 17:44:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 8 time(s).
11/05/02 17:44:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:60000. Already tried 9 time(s).
11/05/02 17:44:59 INFO client.HConnectionManager$TableServers: Attempt 0 of 10 failed with . Retrying after sleep of 2000
please let me know where am i going wrong. please suggest a simple way to use HBase. I just want to use HBase for simple java programming
I tried your example, and these are the errors I am getting
The project cannot be built until build path errors are resolved hbase_mapreduce Unknown Java Problem
Unbound classpath variable: ‘C:cygwin/usr/local/hbase-0.90.2/conf’ in project ‘hbase_mapreduce’ hbase_mapreduce Build path Build Path Problem
Unbound classpath variable: ‘C:cygwin/usr/local/hbase-0.90.2/hbase-0.20.3.jar’ in project ‘hbase_mapreduce’ hbase_mapreduce Build path Build Path Problem
Unbound classpath variable: ‘C:cygwin/usr/local/hbase-0.90.2/lib/commons-logging-1.0.4.jar’ in project ‘hbase_mapreduce’ hbase_mapreduce Build path Build Path Problem
Unbound classpath variable: ‘C:cygwin/usr/local/hbase-0.90.2/lib/hadoop-0.20.1-hdfs127-core.jar’ in project ‘hbase_mapreduce’ hbase_mapreduce Build path Build Path Problem
Unbound classpath variable: ‘C:cygwin/usr/local/hbase-0.90.2/lib/log4j-1.2.15.jar’ in project ‘hbase_mapreduce’ hbase_mapreduce Build path Build Path Problem
Unbound classpath variable: ‘C:cygwin/usr/local/hbase-0.90.2/lib/zookeeper-3.2.2.jar’ in project ‘hbase_mapreduce’ hbase_mapreduce Build path Build Path Problem
Please let me know where I am going wrong
This is an awesome tutorial. Gave me a good feel for what HBase brings to Hadoop. Good stuff!
Thank you Sujee, Really useful.
It’s working fine. But If I check the table “access_logs”, It is present only in one of the machine(cluster of 3).
How can I make it as distributed?
Thank you Sujee, Really useful.
I did this, in a cluster which contains 3 Region Servers
It’s working fine. But If I check the table “access_logs”, It is present only in one of the Region Server.
How can I make it as distributed?
@Anil
the table will split as it exceeds the ‘region max size’ (default 256M, I think).
Hi Sujee,
Thanks for great tutorial!!
I am able to run the mapreduce application using HDFS and HBase both.
But now I want to use in memory data to be used as input to MapReduce application.
e.g. – I have few data maps in my memory and want to use those in spite of Hbase table as input.
Could you please guide me for this??
Timely reply will be appreciated very much.
Thanks in advance!!
@Amarjeet
not sure what do you mean by using-memory data as Mapreduce input. are you talking about doing ‘data joins’ ?
If that is the case, then you might try this:
- load your data xref data into a memory cache like memcached (this can be done in the driver code of mapreduce)
- in your mapper, access memcached to xref
Hi Sujee,
by memory data I dont mean ‘DATA JOINS’.
I Mean:
lets take this example in your blog -
you are reading input from Hbase table – ‘access_logs’ and then process it and put your output in ‘summary’ table.
What I want is:
I wish I could take input from some Map or list (present in-memory only or create myself in code).
I hope you understand now.
Hi Sujee ,
really very good doc …..
thanks
Thanks, very good tut!
I am getting below error. I saw in most of the places in blogs that it is resolved. But no clear details on how it got solved. Any help on this is really appreciated.
Below is the error on running hbase -mapreduce program
12/02/20 02:19:22 INFO mapred.JobClient: Task Id : attempt_201202182211_0032_m_000000_2, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864)
at org.apache.hadoop.mapreduce.JobContext.getCombinerClass(JobContext.java:207)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1307)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:980)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
@Manu
you need to Hbase jar files to your HADOOP_CLASSPATH.
edit $HADOOP_HOME/conf/hadoop-env.sh and edit the line
# Extra Java CLASSPATH elements. add hbae jars
export HADOOP_CLASSPATH=/hadoop/hbase/hbase-0.20.3.jar:/hadoop/hbase/hbase-0.20.3-test.jar:/hadoop/hbase/conf:/hadoop/hbase/lib/zookeeper-3.2.2.j
i m also having almost same error
java.lang.runtimeexception : java.lang.classnotfoundexception :hbase_mapred1.FreqCounter$mapper1
at org.apache.hadoop.conf.configuration.getclass……….
i have also checked my jar file using
jar tf freqcounter.jar
it contains hbase_mapred1/FreqCounter$mapper1.class
plz help me….
i have also set my hadoop-env.sh in hadoop/conf
plz help…….
n one more thing
when i see hbase at http://localhost:60010/master.jsp i only see one regionserver i.e master
when i start hbase without stopping it, the regionservers on slaves again start whereas om master it says regionserver is running a process XXXX stop it first….
do you have any idea why it is so??
plz help….!!!
thanx in advance
hi
i have created a table in hbase with 12 columns in each row and each column has 8 qualifiers.when i try to read complete row it returns correct value for 1:1 in row 1 but returns null for 1:2
it reads all the columns correctly from 2 to 10….
plz help how to solve this problem
i m using this code for reading….it is inside for loop thar runs from 1 to 10..
train[0][i] = Double.parseDouble(Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)),Bytes.toBytes(“1″))));
train[1][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“2″))));
train[2][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“3″))));
train[3][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“4″))));
train[4][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“5″))));
train[5][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“6″))));
train[6][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“7″))));
train[7][i] = Double.parseDouble (Bytes.toString (r.getValue (Bytes.toBytes(Integer.toString(i)), Bytes.toBytes(“8″))));
Awesome tutorial!! Works like a charm! Loved the step by step approach…Great one for beginners..
Hi,
Can you please how explain you created a composite row key.
I have been trying to create an HBase Table with a composite row key that has a number and a timestamp. The Put Constructor you have used takes in two byte values thereby converting the timestamp to a byte value as well. While I am able to do this, I’m unable to perform any operation on the timestamp or the number individually as the entire row key is inserted as one entity, Could you please guide further on how a partial key scan may be performed.
Thanks in advance.
Hey
Can you help me out a lil more on the map function. I understood the basics of mapreduce but can’t get how to utilize it when you are doing more complex processing.
How do we actually access the individual columns in a map function? I know that a map returns a row. how do we break it down so that we can compare values of two or more columns which maybe of different column families?
For example, a User has a visitor and I want to know if that visitor is one of his friends?
Please help.Just started learning hbase.Having a problem is visualizing how the data appears in a map reduce.
Hi sujee,
Wonderful doc which helped me a lot.
I want some suggestions like,
1) how to display output of reduce functions into JTable.
2) I have a jar file, after execute that jar file a text box will display. how the data which we entered in text box will get output for that data.
Please guide me, Its urgent
Thanks & Regards,
V.Ramanjaneylu.
Perfect!
It works. I’ve used your program in the following enviroment:
Hadoop 1.0.2
HBase 0.92.1
I observed a few changes:
–HBaseConfiguration hbaseConfig = new HBaseConfiguration();
++Configuration conf = HBaseConfiguration.create();
–String columns = “details”; // comma seperated
–scan.addColumns(columns);
++scan.addFamily(Bytes.toBytes(“details”));
Need to replace — with ++ statements in your program.
Thanks a lot.
Hbase APi changed a bit recently. Thanks for pointing this out. I will update the code.
regards
Sujee Maniyam (admin)