Archive for November, 2009

Improve performance on small hadoop clusters

Monday, November 30th, 2009

Hadoop is designed to run on huge clusters containing several hundred machines. But some people just don’t need such a big cluster and are able to use the benefits of HDFS and MapReduce on a smaller scale.

We managed to improve performance of our 10-node-test-cluster by almost 100% by adjusting the heartbeat intervals. Namenode and jobtracker use heartbeats to communicate with their workers (datanodes and tasktrackers).
We concentrate on jobtracker heartbeats. To reliably manage huge cluster the minimum interval is 3 seconds. Every 10 nodes the interval is increased by a second. If you have lots of fast running map- or reduce-tasks this implies a noticeable overhead.

What we did was to patch Hadoop and lower the minimum heartbeat interval to as low as 500ms and the increment to 10ms per node. This way we got our MapReduce-jobs run almost twice as fast. If you want to try it, you could take a look at our github branch (view commit). Please note that the git-branch contains our adopted version of Hadoop, so use it only for testing purposes.

There is a fix (HADOOP-5784) in the upcoming version 0.21 which allows you to lower the heartbeat increment per node.

“Internet slow” on Ubuntu Karmic Koala (9.10)

Sunday, November 8th, 2009

“Internet slow” means actually “DNS slow”. After upgrading to Ubuntu 9.10 I experienced a strange and very annoying lag in DNS resolution. Running dig in a shell worked like a charm. But Firefox, Synaptic and everything else was hanging at DNS resolution.

To make a long story short (you probably read a lot of forum threads about this): Our Karmic Koala uses IPv6 for DNS queries and only if this fails it falls back to IPv4. A lot of home routers do not support IPv6 DNS queries. DOH!

Resolutions:

1. Firefox only: Disable IPv6 support by typing “about:config” into your location bar, then search for ipv6 and disable it by clicking on the line.

2. Disable IPv6 entirely: If you do not need IPv6-Support (I don’t) you could disable it completely and everything is up to speed again. How do I do this?