Archive for the ‘tech’ Category

[tech] Hadoop Gotchas

Monday, April 21st, 2008

I just started tinkering with Hadoop - ‘(Java) distributed computing platform’ from Apache. Even though it is a pretty nice platform, I wasted lot of time, chasing trivial / silly issues. Here they are, so some one else might find them useful.

hadoop version : 0.16.3

A) The dreaded ‘Port Out of Range’ exception in ‘NameNode’

2008-04-20 19:27:34,241 ERROR org.apache.hadoop.dfs.NameNode: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118)
at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:65)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1180)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:53)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1191)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:122)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:94)
at org.apache.hadoop.fs.Trash.<init>(Trash.java:63)
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:134)
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:176)
at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:162)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)

2008-04-20 19:27:34,243 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting
2008-04-20 19:27:34,243 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at sanfrancisco/127.0.1.1

This issue caused me so much aggravation. The issue is - my hostnames.

/etc/hosts

#hadoop
192.168.0.12 hadoop_master
192.168.0.2 hadoop_slave_1
hadoop-site.xml:

<property>
<name>fs.default.name</name>
<value>hadoop_master:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hadoop_master:9001</value>
</property>
For some reason, hostnames with underscores ( ‘hadoop_master’ or ‘hadoop_slave_1′ ) are causing this weired error.

Solution: after replacing all instances of

hadoop_master —-> master

hadoop_slave_1 —> slave

Every thing just worked !!!

Or you could use IP-addresses as well.

B)java.io.IOException: Incompatible namespaceIDs

At least this issue and a workaround is some what easier to find than the previous one.

go to the end of this tutorial : http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

http://issues.apache.org/jira/browse/HADOOP-1212

*Whew*

I have reverted back to hadoop v0.15.3, so the namespace issue went away
C) Map/Reduce tasks dying out

I kick off a simple wordcout across two machines, and my reduce tasks will die out, and process will hang for ever. Trawling through the logs revealed this is caused by some really weired reverse hostname lookups by Hadoop (or Java.net) framework.

For example, my IP address (192.168.0.10)  was resolved as ’somehost.comcast.net’.  After tweaking my ‘/etc/hosts’  things worked like a charm!

Ahh… why go to the bother of doing a reverse IP lookup?  Just use the IP address please..

These issues were trivial / silly, but time-consuming nonetheless.


Looks like Hadoop - the Elephant - can scale mountains, but slips on a banana peel :-)

[tech] Geocode data for US Zips and Cities

Tuesday, March 4th, 2008

I have some  geocode data (GPS co-ordinates) for US Zips or Cities.

* Read the article *

[tech] Setting default browser for Thunderbird in Kubuntu

Monday, February 4th, 2008

Thunderbird is a great mail application, but this problem nagged me enough to warrant a post!

I run Kubuntu - KDE based flavor of Ubuntu. Thunderbird is a GNOME application and doesn’t seem to take the default BROWSER values set in KDE control panel or in my BASH profile (BROWSER=/usr/bin/firefox). I tried this - http://www.knoppix.net/forum/viewtopic.php?t=20315 - but no luck

Here is one way of fixing it:

1) Install gnome-control-center
sudo apt-get install gnome-control-center
or do it through Adept Manager UI.
this will install a bunch of Gnome libraries, that is okay

2) Once the install is done, launch ‘gnome-control-center‘ and goto Preferred Applications. Select FIREFOX as browser and choose ‘Open in Tab’

browser.png

[tech] Sansa Playlist Creator for Linux

Monday, January 28th, 2008

Here is a bash script I came up with to create playlists on Sansa music player, on Linux

** read more **

[tech] Using Xmlbeans to process Epcis events

Monday, May 21st, 2007

This article walks through examples of processing Epcis XML events in Java, using Xmlbeans library.

*Go to Article*