pero on anything

Increasing Performance of Hadoop-Unit-Tests

Adding a lot of unit tests for our application that uses Hadoop and its Map-Reduce-Engine significantly increased integration build time. Hadoop comes with a LocalJobRunner which is used by default so you do not have to set up a complete cluster in order to run some Unit-Tests. This is great! But the problem is: it still produces a lot of overhead. Ramp-up and tear-down of a job might still take up to a few seconds. Having some hundreds of Map-Reduce-Jobs in your unit-test-base will definitely drag you away from the ideal “10 minute integration build feedback” you are always striving to get. ;)

I cannot provide a complete solution to this “problem” (hey, it is still great to be able to run Map-Reduce-Jobs locally!), but the following configuration parameters cut the execution times of our tests in halves:

io.sort.record.percent
0.01
io.sort.mb
1
min.num.spills.for.combine
0

These settings are only feasible for small jobs with little input, of course.

I’m always glad to hear of better solutions to decrease the overhead even more!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">