Recent Posts

Apache Beam Summit 2022

15 minute read

Last week I virtually attended Apache Beam Summit 2022 held in Austin, Texas. The event concentrates around Apache Beam and the runners like Dataflow, Flink...

GCP Dataproc and Apache Spark tuning

8 minute read

Dataproc is a fully managed and highly scalable Google Cloud Platform service for running Apache Spark. However, “managed” does not relieve you from the prop...

GCP Cloud Composer 1.x tuning

16 minute read

I would love to only develop streaming pipelines but in reality some of them are still batch oriented. Today you will learn how to properly configure Google ...

Stream processing - part 2

23 minute read

This is the second part of the stream processing blog post series. In the first part I presented aggregations in a fixed, non-overlapping windows. Now you wi...

Stream processing - part 1

17 minute read

This is the very first part of the stream processing blog post series. From the series you will learn how to develop and test stateful streaming data pipelin...

Kafka Streams DSL vs processor API

29 minute read

Kafka Streams is a Java library for building real-time, highly scalable, fault-tolerant, distributed applications. The library is fully integrated with Kaf...

Apache BigData Europe

11 minute read

Last week I attended Apache Big Data Europe held in Sevilla, Spain. The event concentrates around big data projects under Apache Foundation umbrella. Below...