Apache Kafka and the Rise of Stream Processing

Guozhang Wang

About the Talk

In the past few years Apache Kafka has emerged itself as the world's most popular real-time data streaming platform backbone. In this talk, we introduce Kafka Streams, the latest addition to the Apache Kafka project, which is a new stream processing library natively integrated with Kafka.

Kafka Streams has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.

So this talk will examine some of the distinct features of stateful stream processing engines. The relevance of using a distributed log to share data between services. How tables and streams can be joined and operated upon efficiently. These will be laid over a set of common microservice use cases. Finally we’ll reflect on the importance of exactly once processing and its relevance to ensuring correctness as the web of interactions inevitably grows over time.

Get the Slides

Guozhang Wang

Guozhang is a an engineer at Confluent, building a stream data platform on top of Apache Kafka. He receives his PhD from Cornell University database group where he worked on scaling iterative data-driven applications. Prior to Confluent, Guozhang was a senior software engineer at LinkedIn, developing and maintaining its backbone streaming infrastructure on Apache Kafka and Apache Samza.