May 24, 2017

Reactive Java Performance Comparison

This blog post covers the basic concepts of reactive programming and presents a comparison of the performance of a standard Spring MVC WebApp with Cassandra as data layer to a reactive one.

Reactive programming has become a popular topic in recent years. With the upcoming Spring 5 release, reactive programming is finally emerging into the mainstream. Reactive programming should improve the performance of I/O-heavy applications. A regular standard Spring MVC application typically runs in an application server like Jetty or Tomcat on a servlet stack. Each request is bound to a thread that is passed through the servlet container to the user code before it finally gets some data. Normally, the majority of these operations are I/O-bound. The main disadvantage of this model is that the allocated resources (threads) are waiting for these I/O operations to complete, which can take the majority of the thread time. In the meantime, these resources cannot handle any further requests. With reactive programming, the program's code itself is no longer in charge of resource allocation, but instead provides callbacks that are invoked when resources (e.g., new data from the database) are available.

May 15, 2017

From Batch to Stream Processing

At willhaben, a lot of data is passed around between components. At peak times, we handle over 100,000 events every minute. Saving this huge amount of data does not create bottlenecks on modern hardware but carrying out proper (real-time) analysis of data on this scale can become a challenge.

These events sum up to terabytes and getting real-time statistics from them can be really valuable for monitoring either IT infrastructure or business processes. Big Data comes to mind, a buzzword that covers a lot of topics. One of these concepts, often seen in this context, is stream processing. This is a very extensive topic so this post will only talk about some of the ideas behind streams and stream processing, and not go into detail. There are several frameworks for stream processing like Apache Spark, Apache Flink, or Kafka Streams. Each of them has its own advantages and disadvantages. A comparison between them is beyond the scope of this post. Here we will focus on Kafka Streams as it is the simplest one and covers our user cases. We will first cover the basic concepts and then look at the processing pipeline implemented at willhaben.