January 22, 2016

Micro Benchmarking with JMH

Developing software and algorithms normally becomes more difficult with the amount of data that needs to be handled. Processing thousands of data records on modern hardware is no problem at all. However, when millions or hundreds of millions of records need to be processed, it can be quite difficult to meet the expected runtime performance. A batch job that should be executed every hour, for example, has to finish within this exact time frame. So how can we efficiently improve runtime, even in large-data contexts? Vertical scaling has its limits and horizontal scaling can take more effort than optimizing the batch job performance itself. Alternatively, let’s look at micro benchmarking as a means to optimize certain parts of a programm.

Let’s assume we have two sorted lists with integers and we want all integers that are in the first list but not in the second one. According to set theory we want A \ B (blue area in the figure below).