Increasing Performance of Hadoop-Unit-Tests
Adding a lot of unit tests for our application that uses Hadoop and its Map-Reduce-Engine significantly increased integration build time. Hadoop comes with a LocalJobRunner which is used by default so you do not have to set up a complete cluster in order to run some Unit-Tests. This is great! But the problem is: it still produces a lot of overhead. Ramp-up and tear-down of a job might still take up to a few seconds. Having some hundreds of Map-Reduce-Jobs in your unit-test-base will definitely drag you away from the ideal “10 minute integration build feedback” you are always striving to get.
I cannot provide a complete solution to this “problem” (hey, it is still great to be able to run Map-Reduce-Jobs locally!), but the following configuration parameters cut the execution times of our tests in halves:
These settings are only feasible for small jobs with little input, of course.
I’m always glad to hear of better solutions to decrease the overhead even more!
Hadoop and Linux kernel 2.6.27 – epoll limits Simulating indexes in Hadoop