Apache Spark

GCP Dataproc and Apache Spark tuning

8 minute read

Dataproc is a fully managed and highly scalable Google Cloud Platform service for running Apache Spark. However, “managed” does not relieve you from the prop...

Spark and Kafka integration patterns - part 2

21 minute read

In the world beyond batch, streaming data processing is a future of dig data. Despite of the streaming framework using for data processing, tight integrati...

Spark and Kafka integration patterns - part 1

less than 1 minute read

I published post on the allegro.tech blog, how to integrate Spark Streaming and Kafka. In the blog post you will find how to avoid java.io.NotSerializableExc...

Spark and Spark Streaming unit testing

11 minute read

When you develop a distributed system, it is crucial to make it easy to test. Execute tests in a controlled environment, ideally from your IDE. Long develop-...