September 28, 2016

Parallel Streams

The Streams API that came with Java 8 enabled functional programming by taking advantage of the new lambda functionality and the method references. One of the most popular features of streams is that they support lazy evaluation. Another very popular feature of streams is that they are capable of processing results concurrently using multiple threads. In this article, we will examine the main points to consider when using a parallel stream.

Creating a parallel stream is really simple; you just have to designate that your stream is parallel and the operations are done in parallel automatically for you. There is no need to write any threading code to make it happen. There are 2 ways to create a parallel stream: you can either create it from an existing stream, by calling the parallel() method, or use an extension method that has been added to the Collection interface, as below:
 Stream<String> stream = Stream.of("cat", "dog");  
 Stream<String> parallelStream = stream.parallel();  
   
 Stream<String> parallelStream2 = Arrays.asList("cat", "dog").parallelStream();  


As you can see, creating a parallel stream is the easy part. The challenging part begins with using it correctly. It is widely known that making use of multiple cores was the main motivation behind new Java 8 features like lambdas and streams. However, there is never a guarantee that using a parallel stream will improve the performance of your system. You should consider using parallel streams if:

  • The pipeline operations are threadsafe, you are not modifying any shared/global data, your lambda expressions are stateless and the stream operations can be executed independently
  • You are iterating through the stream items 
  • Your stream is relatively big - there is some overhead connected with allocating and building up the parallel processing structures
  • You should ensure that all tasks submitted to the common fork-join pool will not get stuck and will finish in a reasonable time. All parallel streams use common for-join thread pool and if you submit a long-running task, you essentially block all the threads in the pool. [1]

Performing order-based operations

You have to be careful with stream operations that are based on order. For example, you can be sure that the results of ordered operations like min, max and sorted will be consistent both on a parallel and a serial stream. The findFirst stream operation, on the other hand, can return different results when processed in parallel because the order is not guaranteed with parallel streams.

In case you really care about the order in which a stream processes the results, you can use another version of the forEach() operation called forEachOrdered(). If you use this method with a parallel stream, it compels the stream to process the stream items in the order specified by its source at the cost of performance. In the next example, you can be sure that the output will always be "1 2 3 4 5":
 IntStream.of(1,2,3,4,5).parallel().forEachOrdered(i -> System.out.print(i + ""));  

If you don't care about the order, you can create an unordered stream from a "default" ordered stream to improve the performance when using a parallel stream:
 IntStream parallelUnorderedStretam = IntStream.of(1,2,3,4,5).unordered().parallel();  
 //do other operations on the stream  
The unordered() method does not really reorder the items in the stream, it simply tells the JVM that it can ignore the order of the elements. This can improve the performance of certain operations like distinct(), skip(), or limit().

Performing reduction operations

The Collectors class contains 2 sets of methods that perform better with parallel streams than their non-concurrent counterparts[2]:
  • Collectors.toConcurrentMap()
  • Collectors.groupingByConcurrent()
It is highly recommended that you take advantage of these collector methods to achieve the best possible performance at runtime.

Conclusion

There is no guarantee that the Java 8 parallel streams will make your programs run faster. In fact, it could even make your programs run slower. For parallel streams to work effectively, the programmer really has to think twice and understand what he or she is doing.


[1]: https://dzone.com/articles/think-twice-using-java-8
[2]: https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html

No comments:

Post a Comment