Increasing Performance of Hadoop-Unit-Tests
Monday, March 30th, 2009Adding a lot of unit tests for our application that uses Hadoop and its Map-Reduce-Engine significantly increased integration build time. Hadoop comes with a LocalJobRunner which is used by default so you do not have to set up a complete cluster in order to run some Unit-Tests. This is great! But the problem is: it still produces a lot of overhead. Ramp-up and tear-down of a job might still take up to a few seconds. Having some hundreds of Map-Reduce-Jobs in your unit-test-base will definitely drag you away from the ideal “10 minute integration build feedback” you are always striving to get.
I cannot provide a complete solution to this “problem” (hey, it is still great to be able to run Map-Reduce-Jobs locally!), but the following configuration parameters cut the execution times of our tests in halves:
<name>io.sort.record.percent</name>
<value>0.01</value>
</property>
<property>
<name>io.sort.mb</name>
<value>1</value>
</property>
<property>
<name>min.num.spills.for.combine</name>
<value>0</value>
</property>
These settings are only feasible for small jobs with little input, of course.
I’m always glad to hear of better solutions to decrease the overhead even more!