Condense

Industry

Developers

Company

Resources

Try For Free

Condense

Industry

Developers

Company

Resources

Try For Free

Back to All Blogs

Kafka Streams: A Production Guide to Joins, Aggregations, and Stateful Processing

Written by

Sugam Sharma

|

Co-Founder & CIO

Published on

Jun 18, 2026

Apache Kafka

Technology

Share this Article

Share This Article

TL;DR

Building real-time data pipelines is often more complex than it appears at first. A simple stream processing application may start with a few Kafka topics, joins, and aggregations. As data volumes grow and production workloads increase, teams must deal with late-arriving events, state management, partitioning strategies, windowing logic, schema evolution, and operational overhead. What begins as a straightforward stream processing task can quickly become difficult to maintain and scale. The challenge is that real-world events rarely arrive in a perfectly ordered sequence. A customer action, a payment event, a backend update, and a device telemetry record may all be generated by different systems at different times. The business value emerges only when these events are correlated, enriched, and aggregated into meaningful outcomes. In this guide, we'll explore how Kafka Streams handles joins and aggregations, the key concepts behind stateful stream processing, and the common challenges teams face when implementing these patterns in production. We'll also look at how Condense simplifies stream processing through its no-code and low-code development capabilities. Instead of writing and maintaining complex stream processing logic from scratch, teams can use prebuilt connectors, visual workflow builders, reusable transforms, and an integrated development environment to create real-time pipelines faster. Condense abstracts much of the operational complexity while still providing the flexibility to build custom stream processing applications when required, allowing teams to focus on business logic rather than infrastructure and framework management.

Understanding the Stream Abstraction

Before discussing joins and aggregations, it's important to understand the two data models Kafka Streams works with: KStream and KTable.

At first glance, a Kafka topic can look like a database table. In reality, it's closer to a continuous log of events. Every record represents something that happened at a specific point in time.

Consider an e-commerce application:

Order Created
Payment Processed
Order Shipped
Order Delivered

Each of these records is an event. Together, they tell the story of an order moving through the system. This is where Kafka Streams introduces two different ways of viewing data.

KStream: A Stream of Events

A KStream represents an ongoing sequence of events. Every record matters, even if multiple records share the same key. If a customer places three orders, Kafka Streams treats them as three separate events because all three actions occurred.

In other words, a KStream answers the question: "What happened?"

KTable: The Current State

A KTable represents the latest known state for a key. Imagine a customer's loyalty tier changes over time:

Customer123 → Bronze
Customer123 → Silver
Customer123 → Gold

A KStream would contain all three events where as KTable would represent only the latest state i.e, Customer123 → Gold

Instead of asking what happened, a KTable answers: "What is true right now?"

Why This Matters

Many real-time applications need both perspectives. Some operations require analyzing individual events as they occur. Others require looking up the latest state of an entity. Kafka Streams provides KStreams and KTables to support both patterns. Understanding the difference is essential because joins, aggregations, and stateful processing all behave differently depending on which abstraction you're working with.

Stream-Stream Joins: Matching Events Across Streams

A stream-stream join combines events from two KStreams when they share the same key and occur within a defined time window.

Consider an order processing system:

The orders stream contains order creation events.
The payments stream contains payment confirmation events.

A stream-stream join can correlate these events to create a complete view of an order.

KStream<String, Order> orders = builder.stream("orders"); 
KStream<String, Payment> payments = builder.stream("payments"); 
 
JoinWindows window = JoinWindows.ofTimeDifferenceWithNoGrace( 
   Duration.ofMinutes(10) 
); 
 
KStream<String, EnrichedOrder> joined = orders.join( 
   payments, 
   (order, payment) -> new EnrichedOrder(order, payment), 
   window, 
   StreamJoined.with( 
       Serdes.String(), 
       orderSerde, 
       paymentSerde 
   ) 
);

KStream<String, Order> orders = builder.stream("orders"); 
KStream<String, Payment> payments = builder.stream("payments"); 
 
JoinWindows window = JoinWindows.ofTimeDifferenceWithNoGrace( 
   Duration.ofMinutes(10) 
); 
 
KStream<String, EnrichedOrder> joined = orders.join( 
   payments, 
   (order, payment) -> new EnrichedOrder(order, payment), 
   window, 
   StreamJoined.with( 
       Serdes.String(), 
       orderSerde, 
       paymentSerde 
   ) 
);

The important concept here is the window.

Unlike a database join, Kafka Streams cannot wait forever for matching records because streams are continuously receiving new events. A window defines how much time can pass between related events before Kafka Streams considers them unrelated. For example, if an order is created at 10:00 AM and the payment arrives at 10:05 AM, a 10-minute window produces a match. If the payment arrives at 10:15 AM, the events fall outside the window and are not joined.

Join Types

Kafka Streams supports three stream-stream join types:

Inner Join: Produces a result only when matching records exist on both sides within the configured window.
Left Join: Always emits records from the left stream. If no matching record exists on the right, the right-side value is returned as null.
Outer Join: Produces results for records from either stream, returning null for the side that does not have a match.

Choosing the Right Window Size

Window size is one of the most important design decisions in a Kafka Streams application.

A window that is too small can cause valid matches to be missed because related events arrive later than expected. A window that is too large increases the amount of state Kafka Streams must maintain, which can impact storage requirements and processing efficiency. The best approach is to base window sizes on actual business and operational behavior rather than arbitrary values. Measure how long related events typically take to arrive and add a reasonable buffer for network delays, retries, and consumer lag.

Stream-Table Joins: Enriching Events with Reference Data

While stream-stream joins correlate events from two continuously changing streams, stream-table joins are typically used to enrich incoming events with the latest known state. Consider a retail application where order events arrive continuously, but customer information changes relatively infrequently.

A KStream can contain incoming orders:

KStream<String, Order> orders = builder.stream("orders"); 
 

A KTable can contain the latest customer profile information: 

KTable<String, Customer> customers = builder.table("customers"); 
 

Joining the two allows every order to be enriched with the most recent customer data: 

KStream<String, EnrichedOrder> enrichedOrders = orders.join( 
   customers, 
   (order, customer) -> new EnrichedOrder(order, customer) 
);

KStream<String, Order> orders = builder.stream("orders"); 
 

A KTable can contain the latest customer profile information: 

KTable<String, Customer> customers = builder.table("customers"); 
 

Joining the two allows every order to be enriched with the most recent customer data: 

KStream<String, EnrichedOrder> enrichedOrders = orders.join( 
   customers, 
   (order, customer) -> new EnrichedOrder(order, customer) 
);

Unlike stream-stream joins, stream-table joins do not require a time window. When an order arrives, Kafka Streams simply looks up the current value for the matching key in the KTable and produces the enriched result. This pattern is commonly used for:

Customer profile enrichment
Product catalog lookups
Device metadata enrichment
Fleet and vehicle information enrichment
User preference and account lookups

Because the KTable maintains only the latest state for each key, stream-table joins are generally more efficient than stream-stream joins and require less state management.

Co-Partitioning Requirements

Kafka Streams performs joins and aggregations locally. Each task processes only the partitions assigned to it. For a join to work correctly, records with the same key must be routed to the same partition on both topics. This requirement is known as co-partitioning.

For example, if an order event for Order123 is written to partition 2 of the orders topic, the corresponding payment event for Order123 must also be written to partition 2 of the payments topic.

For Kafka Streams joins to work correctly:

Both topics must have the same number of partitions.
Both topics must use the same partitioning strategy.
Records with the same key must be routed to the same partition.

If these conditions are not met, matching records may never reach the same stream task, resulting in incomplete or incorrect join results.

Verifying Topic Configuration

Before deploying a topology, verify partition alignment across all joined topics:

kafka-topics.sh --describe --topic orders --bootstrap-server localhost:9092

kafka-topics.sh --describe --topic payments --bootstrap-server localhost:9092

Production Consideration

Changing the partition count of an existing topic can alter key-to-partition mapping and affect downstream joins. For this reason, partitioning strategy should be considered early in the design process rather than after a pipeline reaches production scale.

Implementing Windowed Aggregations

Joins help correlate related events. Aggregations help summarize them. A common requirement in stream processing is to calculate metrics continuously as events arrive. Examples include:

Orders per hour
Revenue generated per day
Active users in the last 15 minutes
Vehicle trips completed per region
Sensor readings aggregated by device

Unlike traditional batch processing, Kafka Streams performs these calculations incrementally as new events are received.

Counting Events Within a Time Window

The following example counts the number of orders received every five minutes:

KTable<Windowed<String>, Long> orderCounts = 
   orders 
       .groupByKey() 
       .windowedBy(TimeWindows.ofSizeWithNoGrace( 
           Duration.ofMinutes(5) 
       )) 
       .count();

KTable<Windowed<String>, Long> orderCounts = 
   orders 
       .groupByKey() 
       .windowedBy(TimeWindows.ofSizeWithNoGrace( 
           Duration.ofMinutes(5) 
       )) 
       .count();

In this example:

Records are grouped by key.
Events are collected into 5-minute windows.
Kafka Streams continuously updates the count as new events arrive.

Rather than waiting for a batch job to run, results are available in near real time.

Types of Aggregation Windows

Kafka Streams supports several windowing strategies.

Tumbling Windows

Fixed-size, non-overlapping windows. For example:

10:00 - 10:05
10:05 - 10:10
10:10 - 10:15

Each event belongs to exactly one window. These are commonly used for reporting and dashboards.

Hopping Windows

Fixed-size windows that overlap. For example, a 10-minute window that advances every 5 minutes:

10:00 - 10:10
10:05 - 10:15
10:10 - 10:20

A single event can contribute to multiple windows. These are useful for rolling metrics and trend analysis.

Session Windows

Windows based on periods of activity separated by inactivity gaps. For example, multiple user actions occurring within a short period can be grouped into a single session.

These are commonly used for user behavior analytics and clickstream processing.

Common Aggregation Functions

Kafka Streams supports several aggregation operations:

Count
Sum
Average
Minimum
Maximum
Custom aggregations

For example, calculating total revenue:

orders 
   .groupByKey() 
   .windowedBy(TimeWindows.ofSizeWithNoGrace( 
       Duration.ofMinutes(5) 
   )) 
   .aggregate( 
       () -> 0.0, 
       (key, order, total) -> 
           total + order.getAmount() 
   );

orders 
   .groupByKey() 
   .windowedBy(TimeWindows.ofSizeWithNoGrace( 
       Duration.ofMinutes(5) 
   )) 
   .aggregate( 
       () -> 0.0, 
       (key, order, total) -> 
           total + order.getAmount() 
   );

Windowed aggregations form the foundation of many real-time analytics applications, enabling organizations to derive insights from streaming data as events occur.

Emitting Final Results with Suppression

By default, Kafka Streams emits an updated aggregation result every time the aggregate changes. For example, if an order count increases from 100 to 101, Kafka Streams immediately emits the updated value. As more events arrive, additional updates continue to be emitted.

While this provides real-time visibility, it can generate a large number of intermediate updates. In high-volume workloads, these updates can overwhelm downstream databases, dashboards, or APIs that only need the final result. Kafka Streams provides suppression to buffer intermediate updates and emit only the final aggregation result once a window has fully closed.

KTable<Windowed<String>, Long> finalCounts = orders 
   .groupByKey() 
   .windowedBy( 
       TimeWindows.ofSizeAndGrace( 
           Duration.ofMinutes(5), 
           Duration.ofSeconds(30) 
       ) 
   ) 
   .count() 
   .suppress( 
       Suppressed.untilWindowCloses( 
           Suppressed.BufferConfig.unbounded() 
       ) 
   );

KTable<Windowed<String>, Long> finalCounts = orders 
   .groupByKey() 
   .windowedBy( 
       TimeWindows.ofSizeAndGrace( 
           Duration.ofMinutes(5), 
           Duration.ofSeconds(30) 
       ) 
   ) 
   .count() 
   .suppress( 
       Suppressed.untilWindowCloses( 
           Suppressed.BufferConfig.unbounded() 
       ) 
   );

In this example, Kafka Streams waits until the five-minute window and its grace period have completely elapsed before emitting the final count. This approach reduces downstream write amplification and is particularly useful for reporting systems, dashboards, and external databases where intermediate updates provide little value.

Trade-Offs of Suppression

The primary trade-off is latency. Without suppression, consumers receive updates immediately as events arrive. With suppression enabled, results are not available until the window closes.

Another consideration is buffer management. Suppressed results are held in memory until they can be emitted. In high-cardinality workloads, an unbounded buffer can consume significant resources.

For production deployments, consider using bounded buffers such as maxBytes() or maxRecords() to prevent excessive memory usage. Suppression is most valuable when consumers care about the final aggregation result rather than every intermediate state change.

Stateful Operations and State Stores

Many stream processing operations require Kafka Streams to remember information between events. For example:

A stream-stream join must remember records from both streams until matching events arrive.
A windowed count must remember the current count for each window.
A session aggregation must track activity across multiple events.

These are known as stateful operations because the application maintains state as it processes data.

What Is a State Store?

A state store is a local database maintained by Kafka Streams. Whenever a join, aggregation, or lookup operation requires historical context, Kafka Streams stores the necessary information in a state store instead of keeping everything in memory.

By default, Kafka Streams uses RocksDB, an embedded key-value database optimized for high-throughput workloads.

How State Stores Support Stream Joins

When a stream-stream join is executed, Kafka Streams maintains local state stores for both streams. As new events arrive, Kafka Streams stores them temporarily and checks whether a matching record already exists on the other side of the join within the configured window.

For example, if an order event arrives first, Kafka Streams stores it in a local state store and waits for a matching payment event. If the payment arrives within the configured window, the join result is emitted immediately. If no matching event arrives before the window expires, the stored record is eventually removed.

This mechanism allows Kafka Streams to correlate events that arrive at different times while maintaining high processing throughput.

How State Stores Support Aggregations

Consider a five-minute order count. As events arrive, Kafka Streams continuously updates the aggregation result:

10:00 - 10:05 → 150 Orders
10:05 - 10:10 → 212 Orders
10:10 - 10:15 → 189 Orders

Instead of recalculating the count from scratch, Kafka Streams updates the existing value stored in the state store. This allows aggregations to scale efficiently even when processing millions of events.

Fault Tolerance and Recovery

A common question is:
"What happens if the application crashes?"

Kafka Streams addresses this by writing state changes to internal changelog topics in Kafka. If an instance fails, a new instance can rebuild its state by replaying records from these changelog topics. This mechanism provides durability while allowing state to remain local for fast access during normal operation.

Why State Management Matters

Stateful processing enables powerful capabilities such as:

Event correlation
Windowed aggregations
Session analytics
Real-time enrichment
Fraud detection
Operational monitoring

However, state also introduces operational challenges. As data volumes increase, state stores grow in size, recovery times increase, and resource utilization becomes more important. Understanding how Kafka Streams manages state is essential for building reliable real-time applications at scale.

Production Considerations and Error Handling

Handling Late Data and Stream Time

In production environments, events rarely arrive in perfect order. Network delays, retries, consumer lag, and temporary outages can cause records to arrive minutes or even hours after they were originally generated. If these scenarios are not accounted for, valid events may be excluded from joins and aggregations.

Using Grace Periods

Kafka Streams allows applications to accept late-arriving events through grace periods.

JoinWindows window = JoinWindows 
   .ofTimeDifferenceAndGrace( 
       Duration.ofMinutes(10), 
       Duration.ofMinutes(2) 
   );

JoinWindows window = JoinWindows 
   .ofTimeDifferenceAndGrace( 
       Duration.ofMinutes(10), 
       Duration.ofMinutes(2) 
   );

In this example, matching events are accepted for up to two additional minutes after the ten-minute join window closes. Events arriving after the grace period are discarded. While grace periods improve result accuracy, they also increase the amount of state Kafka Streams must retain. Larger grace periods can lead to higher memory, disk, and recovery costs, particularly for high-volume workloads.

Understanding Stream Time

One of the most commonly misunderstood concepts in Kafka Streams is time itself. Kafka Streams uses event time, often referred to as stream time, rather than the system clock. Stream time advances based on the timestamps of records being processed. For example, if no new records arrive for twenty minutes, stream time does not advance, and window processing may remain open longer than expected. This behavior allows Kafka Streams to correctly process out-of-order events while preserving event-time accuracy.

Timestamp Extractors

Kafka Streams determines event time using a timestamp extractor. The default approach is to use timestamps embedded within the event itself:

Properties props = new Properties(); 
 
props.put( 
   StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, 
   FailOnInvalidTimestamp.class 
);

Properties props = new Properties(); 
 
props.put( 
   StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, 
   FailOnInvalidTimestamp.class 
);

For most production workloads, event timestamps provide more reliable results than processing-time timestamps.

Production Best Practice

Avoid relying on processing time or wall-clock time for joins and aggregations whenever possible. Business events rarely arrive at the exact moment they are generated. Designing pipelines around event time, appropriate window sizes, and carefully chosen grace periods helps ensure accurate and predictable stream processing behavior.

Managing State Store Growth

Stateful stream processing always comes with a cost: storage. Every join window, aggregation, and session window requires Kafka Streams to retain state. As event volumes increase, state stores can grow significantly, affecting disk usage, recovery times, and application performance.

For example, a stream-stream join must temporarily store records from both streams until the join window expires. Similarly, windowed aggregations must retain intermediate results until the window closes and any grace period has elapsed.

The impact becomes more noticeable as:

Window sizes increase
Grace periods increase
Event throughput grows
Key cardinality increases

A common mistake is configuring large windows without understanding the resulting state footprint. While a larger window may improve matching accuracy, it also requires Kafka Streams to retain more records for longer periods.

Best Practices

To keep state stores manageable:

Use the smallest window size that satisfies business requirements.
Avoid excessive grace periods.
Monitor state store disk utilization.
Review key cardinality and partitioning strategies.
Remove unused aggregations and materialized views.

Remember that state stores are not temporary implementation details. They are a critical part of Kafka Streams architecture and should be monitored like any other production datastore.

Exactly-Once Processing and Duplicate Events

In distributed systems, duplicate events are inevitable. Network retries, producer failures, consumer restarts, and application crashes can all result in the same event being processed more than once. Without proper safeguards, duplicates can lead to incorrect aggregation results, inflated metrics, and duplicate downstream records. Consider a revenue aggregation pipeline. If an order event worth $100 is processed twice, the total revenue becomes inaccurate. Kafka Streams addresses this challenge through Exactly-Once Semantics (EOS).

Properties props = new Properties(); 
 
props.put( 
   StreamsConfig.PROCESSING_GUARANTEE_CONFIG, 
   StreamsConfig.EXACTLY_ONCE_V2 
);

Properties props = new Properties(); 
 
props.put( 
   StreamsConfig.PROCESSING_GUARANTEE_CONFIG, 
   StreamsConfig.EXACTLY_ONCE_V2 
);

With exactly-once processing enabled, Kafka Streams coordinates state updates, record processing, and output writes within a transactional boundary.

This helps ensure that:

Records are processed once, even during failures.
State stores remain consistent.
Aggregations produce accurate results.
Downstream consumers avoid duplicate updates.

When to Use Exactly-Once Processing

Exactly-once processing is particularly important for:

Financial transactions
Billing systems
Revenue calculations
Inventory management
Compliance reporting

For less critical workloads, some teams choose at-least-once processing to maximize throughput and reduce operational overhead. The right choice depends on whether occasional duplicate processing is acceptable for the business use case.

Production Consideration

Exactly-once guarantees improve correctness but introduce additional coordination and transactional overhead. Before enabling exactly-once processing, validate the performance impact under realistic production workloads and ensure the benefits outweigh the added complexity.

Monitoring and Observability

Building a Kafka Streams application is only half the challenge. Operating it reliably in production requires continuous visibility into the health of your streams, state stores, and processing infrastructure. A stream processing application can remain running while silently producing incorrect results due to consumer lag, late-arriving events, serialization failures, or state store issues. Monitoring helps identify these problems before they impact downstream systems.

Key Metrics to Track

At a minimum, teams should monitor:

Consumer lag
Processing latency
Records processed per second
State store size
Changelog topic growth
Failed deserializations
Dropped records
Rebalance frequency

These metrics provide early warning signs that a streaming application is struggling to keep up with incoming workloads.

Monitoring State Stores

State stores are critical for joins, aggregations, and other stateful operations. As window sizes, grace periods, and event volumes increase, state stores can grow significantly. Excessive growth may increase disk usage, slow recovery times, and impact application performance.

Regularly monitor:

State store size
Restore duration during failover
Changelog topic throughput
Disk utilization

Treat state stores as production data infrastructure, not temporary application caches.

Detecting Late Events

A sudden increase in late-arriving records often indicates upstream issues such as:

Network instability
Producer delays
Consumer backpressure
Infrastructure bottlenecks

Tracking late-event rates can help teams identify and resolve operational issues before they affect business outcomes.

Watching Rebalances

Consumer group rebalances can temporarily pause processing while Kafka Streams reassigns partitions and restores local state. Frequent rebalances may indicate:

Insufficient resources
Unstable application instances
Excessive scaling activity
Large state stores

Monitoring rebalance frequency and recovery time helps ensure stable stream processing performance.

Production Best Practice

The most successful Kafka Streams deployments are not necessarily the ones with the most complex joins or aggregations. They are the ones that consistently maintain visibility into application health, state management, and processing behavior under real-world conditions.

Serialization and Data Contracts

Every record crossing a Kafka topic boundary must be serialized before it is written and deserialized when it is consumed. As streaming applications grow, serialization becomes more than a technical implementation detail, it becomes a contract between producers and consumers. A mismatch between the data being produced and the schema expected by consumers can result in deserialization failures, dropped records, or application crashes.

Using Schema Registry

For production workloads, it is recommended to use schema-based serialization formats such as Avro or Protobuf along with a Schema Registry.

props.put( 
   AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
   "http://schema-registry:8081" 
); 
 
props.put( 
   StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, 
   SpecificAvroSerde.class 
);

props.put( 
   AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
   "http://schema-registry:8081" 
); 
 
props.put( 
   StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, 
   SpecificAvroSerde.class 
);

Schema Registry helps enforce compatibility rules and ensures producers and consumers agree on the structure of data being exchanged.

Schema Evolution

As applications evolve, schemas inevitably change. New fields may be added, existing fields may become optional, and data models may be refined over time. Schema evolution allows these changes to occur without breaking downstream consumers, provided compatibility rules are followed.

Production Best Practice

Avoid relying on Java's built-in serialization for Kafka messages. It creates tight coupling between applications, makes schema evolution difficult, and often leads to compatibility issues during upgrades. Using Avro, Protobuf, or JSON Schema with a Schema Registry provides a safer and more maintainable approach for long-term stream processing applications.

KTable Bootstrap and Recovery

KTables are backed by local state stores, typically powered by RocksDB. When a Kafka Streams application starts, it rebuilds these state stores by replaying records from the underlying changelog topics. For small datasets this process is usually fast, but large reference tables can significantly increase application startup and recovery times.

Reducing Recovery Time

Use compacted topics whenever possible. Since Kafka retains only the latest value for each key, Kafka Streams has fewer records to replay during startup. You can also enable standby replicas so that a warm copy of the state store is maintained on another instance.

props.put( 
   StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 
   1 
);

props.put( 
   StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 
   1 
);

During a failure, standby replicas reduce recovery time because much of the required state has already been restored.

Topology Design Best Practices

As Kafka Streams applications grow, topology design becomes just as important as the business logic itself.

Keep Topologies Simple

Every join, aggregation, repartition, and state store introduces additional processing overhead. When multiple approaches can solve the same problem, prefer the simpler topology.

Avoid Unnecessary Repartitions

Operations such as groupBy() and selectKey() may trigger repartitioning. Repartitioning creates internal Kafka topics and increases network and storage overhead. Whenever possible, design producers to emit records using the keys required by downstream processing.

Name State Stores Explicitly

Avoid relying on automatically generated state store names. Explicit names make monitoring, debugging, and topology evolution significantly easier.

Materialized 
   .<String, Long, KeyValueStore<Bytes, byte[]>> 
   as("order-count-by-user");

Materialized 
   .<String, Long, KeyValueStore<Bytes, byte[]>> 
   as("order-count-by-user");

Test Topologies Early

Kafka Streams provides TopologyTestDriver for validating stream processing logic without deploying a Kafka cluster. Testing joins, aggregations, and windowing behavior before production deployment helps catch issues early and improves confidence in topology changes.

Simplifying Kafka Stream Processing with Condense

Kafka Streams provides a powerful framework for building real-time applications, but operating these workloads in production often requires significant engineering effort. Teams must manage joins, windowing strategies, state stores, partition alignment, schema evolution, monitoring, and infrastructure scaling while ensuring reliability and performance.

As streaming workloads grow, the operational complexity can quickly outweigh the effort spent writing business logic. Condense simplifies this process by providing a unified platform for building, deploying, and operating real-time data pipelines on top of proven streaming technologies such as Apache Kafka.

Using Condense, teams can implement common stream processing patterns including:

Stream-stream joins
Stream-table enrichments
Windowed aggregations
Event correlation
Real-time transformations
Stateful processing workflows

Instead of manually configuring every component, developers can leverage visual workflows, reusable transforms, prebuilt connectors, and low-code utilities to accelerate development. For advanced use cases, Condense also provides an integrated development environment that enables teams to build, test, and deploy custom streaming applications while benefiting from managed infrastructure and operational tooling.

Condense also helps simplify production operations by providing:

Managed Kafka deployments
Built-in observability and monitoring
Stream processing lifecycle management
Automated scaling and infrastructure management
State store and resource management
Enterprise-grade security and governance

By reducing the operational burden associated with large-scale stream processing systems, Condense enables engineering teams to focus on delivering business outcomes rather than managing streaming infrastructure.

Conclusion

Kafka stream joins and aggregations are fundamental building blocks for modern real-time applications. From correlating events across streams to enriching records with reference data and computing real-time metrics, these patterns enable organizations to transform raw event streams into actionable insights.

However, building reliable streaming applications requires more than writing join and aggregation logic. Teams must also address state management, late-arriving events, partitioning strategies, fault tolerance, observability, and operational scalability. Understanding these concepts is essential for successfully implementing Kafka stateful stream processing in production environments.

Whether you're building real-time analytics, IoT platforms, fraud detection systems, customer personalization engines, or operational dashboards, mastering these patterns will help you design streaming applications that remain scalable, resilient, and accurate as data volumes grow.

Frequently Asked Questions (FAQs)

Understanding the Stream Abstraction

Before discussing joins and aggregations, it's important to understand the two data models Kafka Streams works with: KStream and KTable.

At first glance, a Kafka topic can look like a database table. In reality, it's closer to a continuous log of events. Every record represents something that happened at a specific point in time.

Consider an e-commerce application:

Order Created
Payment Processed
Order Shipped
Order Delivered

Each of these records is an event. Together, they tell the story of an order moving through the system. This is where Kafka Streams introduces two different ways of viewing data.

KStream: A Stream of Events

In other words, a KStream answers the question: "What happened?"

KTable: The Current State

A KTable represents the latest known state for a key. Imagine a customer's loyalty tier changes over time:

Customer123 → Bronze
Customer123 → Silver
Customer123 → Gold

A KStream would contain all three events where as KTable would represent only the latest state i.e, Customer123 → Gold

Instead of asking what happened, a KTable answers: "What is true right now?"

Why This Matters

Stream-Stream Joins: Matching Events Across Streams

A stream-stream join combines events from two KStreams when they share the same key and occur within a defined time window.

Consider an order processing system:

The orders stream contains order creation events.
The payments stream contains payment confirmation events.

A stream-stream join can correlate these events to create a complete view of an order.

KStream<String, Order> orders = builder.stream("orders"); 
KStream<String, Payment> payments = builder.stream("payments"); 
 
JoinWindows window = JoinWindows.ofTimeDifferenceWithNoGrace( 
   Duration.ofMinutes(10) 
); 
 
KStream<String, EnrichedOrder> joined = orders.join( 
   payments, 
   (order, payment) -> new EnrichedOrder(order, payment), 
   window, 
   StreamJoined.with( 
       Serdes.String(), 
       orderSerde, 
       paymentSerde 
   ) 
);

The important concept here is the window.

Join Types

Kafka Streams supports three stream-stream join types:

Inner Join: Produces a result only when matching records exist on both sides within the configured window.
Left Join: Always emits records from the left stream. If no matching record exists on the right, the right-side value is returned as null.
Outer Join: Produces results for records from either stream, returning null for the side that does not have a match.

Choosing the Right Window Size

Window size is one of the most important design decisions in a Kafka Streams application.

Stream-Table Joins: Enriching Events with Reference Data

A KStream can contain incoming orders:

KStream<String, Order> orders = builder.stream("orders"); 
 

A KTable can contain the latest customer profile information: 

KTable<String, Customer> customers = builder.table("customers"); 
 

Joining the two allows every order to be enriched with the most recent customer data: 

KStream<String, EnrichedOrder> enrichedOrders = orders.join( 
   customers, 
   (order, customer) -> new EnrichedOrder(order, customer) 
);

Customer profile enrichment
Product catalog lookups
Device metadata enrichment
Fleet and vehicle information enrichment
User preference and account lookups

Because the KTable maintains only the latest state for each key, stream-table joins are generally more efficient than stream-stream joins and require less state management.

Co-Partitioning Requirements

For example, if an order event for Order123 is written to partition 2 of the orders topic, the corresponding payment event for Order123 must also be written to partition 2 of the payments topic.

For Kafka Streams joins to work correctly:

Both topics must have the same number of partitions.
Both topics must use the same partitioning strategy.
Records with the same key must be routed to the same partition.

If these conditions are not met, matching records may never reach the same stream task, resulting in incomplete or incorrect join results.

Verifying Topic Configuration

Before deploying a topology, verify partition alignment across all joined topics:

kafka-topics.sh --describe --topic orders --bootstrap-server localhost:9092

kafka-topics.sh --describe --topic payments --bootstrap-server localhost:9092

Production Consideration

Implementing Windowed Aggregations

Joins help correlate related events. Aggregations help summarize them. A common requirement in stream processing is to calculate metrics continuously as events arrive. Examples include:

Orders per hour
Revenue generated per day
Active users in the last 15 minutes
Vehicle trips completed per region
Sensor readings aggregated by device

Unlike traditional batch processing, Kafka Streams performs these calculations incrementally as new events are received.

Counting Events Within a Time Window

The following example counts the number of orders received every five minutes:

KTable<Windowed<String>, Long> orderCounts = 
   orders 
       .groupByKey() 
       .windowedBy(TimeWindows.ofSizeWithNoGrace( 
           Duration.ofMinutes(5) 
       )) 
       .count();

In this example:

Records are grouped by key.
Events are collected into 5-minute windows.
Kafka Streams continuously updates the count as new events arrive.

Rather than waiting for a batch job to run, results are available in near real time.

Types of Aggregation Windows

Kafka Streams supports several windowing strategies.

Tumbling Windows

Fixed-size, non-overlapping windows. For example:

10:00 - 10:05
10:05 - 10:10
10:10 - 10:15

Each event belongs to exactly one window. These are commonly used for reporting and dashboards.

Hopping Windows

Fixed-size windows that overlap. For example, a 10-minute window that advances every 5 minutes:

10:00 - 10:10
10:05 - 10:15
10:10 - 10:20

A single event can contribute to multiple windows. These are useful for rolling metrics and trend analysis.

Session Windows

Windows based on periods of activity separated by inactivity gaps. For example, multiple user actions occurring within a short period can be grouped into a single session.

These are commonly used for user behavior analytics and clickstream processing.

Common Aggregation Functions

Kafka Streams supports several aggregation operations:

Count
Sum
Average
Minimum
Maximum
Custom aggregations

For example, calculating total revenue:

orders 
   .groupByKey() 
   .windowedBy(TimeWindows.ofSizeWithNoGrace( 
       Duration.ofMinutes(5) 
   )) 
   .aggregate( 
       () -> 0.0, 
       (key, order, total) -> 
           total + order.getAmount() 
   );

Windowed aggregations form the foundation of many real-time analytics applications, enabling organizations to derive insights from streaming data as events occur.

Emitting Final Results with Suppression

KTable<Windowed<String>, Long> finalCounts = orders 
   .groupByKey() 
   .windowedBy( 
       TimeWindows.ofSizeAndGrace( 
           Duration.ofMinutes(5), 
           Duration.ofSeconds(30) 
       ) 
   ) 
   .count() 
   .suppress( 
       Suppressed.untilWindowCloses( 
           Suppressed.BufferConfig.unbounded() 
       ) 
   );

Trade-Offs of Suppression

The primary trade-off is latency. Without suppression, consumers receive updates immediately as events arrive. With suppression enabled, results are not available until the window closes.

Another consideration is buffer management. Suppressed results are held in memory until they can be emitted. In high-cardinality workloads, an unbounded buffer can consume significant resources.

Stateful Operations and State Stores

Many stream processing operations require Kafka Streams to remember information between events. For example:

A stream-stream join must remember records from both streams until matching events arrive.
A windowed count must remember the current count for each window.
A session aggregation must track activity across multiple events.

These are known as stateful operations because the application maintains state as it processes data.

What Is a State Store?

By default, Kafka Streams uses RocksDB, an embedded key-value database optimized for high-throughput workloads.

How State Stores Support Stream Joins

For example, if an order event arrives first, Kafka Streams stores it in a local state store and waits for a matching payment event. If the payment arrives within the configured window, the join result is emitted immediately. If no matching event arrives before the window expires, the stored record is eventually removed.

This mechanism allows Kafka Streams to correlate events that arrive at different times while maintaining high processing throughput.

How State Stores Support Aggregations

Consider a five-minute order count. As events arrive, Kafka Streams continuously updates the aggregation result:

10:00 - 10:05 → 150 Orders
10:05 - 10:10 → 212 Orders
10:10 - 10:15 → 189 Orders

Fault Tolerance and Recovery

A common question is:
"What happens if the application crashes?"

Why State Management Matters

Stateful processing enables powerful capabilities such as:

Event correlation
Windowed aggregations
Session analytics
Real-time enrichment
Fraud detection
Operational monitoring

Production Considerations and Error Handling

Handling Late Data and Stream Time

Using Grace Periods

Kafka Streams allows applications to accept late-arriving events through grace periods.

JoinWindows window = JoinWindows 
   .ofTimeDifferenceAndGrace( 
       Duration.ofMinutes(10), 
       Duration.ofMinutes(2) 
   );

Understanding Stream Time

Timestamp Extractors

Kafka Streams determines event time using a timestamp extractor. The default approach is to use timestamps embedded within the event itself:

Properties props = new Properties(); 
 
props.put( 
   StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, 
   FailOnInvalidTimestamp.class 
);

For most production workloads, event timestamps provide more reliable results than processing-time timestamps.

Production Best Practice

Managing State Store Growth

For example, a stream-stream join must temporarily store records from both streams until the join window expires. Similarly, windowed aggregations must retain intermediate results until the window closes and any grace period has elapsed.

The impact becomes more noticeable as:

Window sizes increase
Grace periods increase
Event throughput grows
Key cardinality increases

Best Practices

To keep state stores manageable:

Use the smallest window size that satisfies business requirements.
Avoid excessive grace periods.
Monitor state store disk utilization.
Review key cardinality and partitioning strategies.
Remove unused aggregations and materialized views.

Remember that state stores are not temporary implementation details. They are a critical part of Kafka Streams architecture and should be monitored like any other production datastore.

Exactly-Once Processing and Duplicate Events

Properties props = new Properties(); 
 
props.put( 
   StreamsConfig.PROCESSING_GUARANTEE_CONFIG, 
   StreamsConfig.EXACTLY_ONCE_V2 
);

With exactly-once processing enabled, Kafka Streams coordinates state updates, record processing, and output writes within a transactional boundary.

This helps ensure that:

Records are processed once, even during failures.
State stores remain consistent.
Aggregations produce accurate results.
Downstream consumers avoid duplicate updates.

When to Use Exactly-Once Processing

Exactly-once processing is particularly important for:

Financial transactions
Billing systems
Revenue calculations
Inventory management
Compliance reporting

Production Consideration

Monitoring and Observability

Key Metrics to Track

At a minimum, teams should monitor:

Consumer lag
Processing latency
Records processed per second
State store size
Changelog topic growth
Failed deserializations
Dropped records
Rebalance frequency

These metrics provide early warning signs that a streaming application is struggling to keep up with incoming workloads.

Monitoring State Stores

Regularly monitor:

State store size
Restore duration during failover
Changelog topic throughput
Disk utilization

Treat state stores as production data infrastructure, not temporary application caches.

Detecting Late Events

A sudden increase in late-arriving records often indicates upstream issues such as:

Network instability
Producer delays
Consumer backpressure
Infrastructure bottlenecks

Tracking late-event rates can help teams identify and resolve operational issues before they affect business outcomes.

Watching Rebalances

Consumer group rebalances can temporarily pause processing while Kafka Streams reassigns partitions and restores local state. Frequent rebalances may indicate:

Insufficient resources
Unstable application instances
Excessive scaling activity
Large state stores

Monitoring rebalance frequency and recovery time helps ensure stable stream processing performance.

Production Best Practice

Serialization and Data Contracts

Using Schema Registry

For production workloads, it is recommended to use schema-based serialization formats such as Avro or Protobuf along with a Schema Registry.

props.put( 
   AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, 
   "http://schema-registry:8081" 
); 
 
props.put( 
   StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, 
   SpecificAvroSerde.class 
);

Schema Registry helps enforce compatibility rules and ensures producers and consumers agree on the structure of data being exchanged.

Schema Evolution

Production Best Practice

KTable Bootstrap and Recovery

Reducing Recovery Time

props.put( 
   StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 
   1 
);

During a failure, standby replicas reduce recovery time because much of the required state has already been restored.

Topology Design Best Practices

As Kafka Streams applications grow, topology design becomes just as important as the business logic itself.

Keep Topologies Simple

Every join, aggregation, repartition, and state store introduces additional processing overhead. When multiple approaches can solve the same problem, prefer the simpler topology.

Avoid Unnecessary Repartitions

Name State Stores Explicitly

Avoid relying on automatically generated state store names. Explicit names make monitoring, debugging, and topology evolution significantly easier.

Materialized 
   .<String, Long, KeyValueStore<Bytes, byte[]>> 
   as("order-count-by-user");

Test Topologies Early

Simplifying Kafka Stream Processing with Condense

Using Condense, teams can implement common stream processing patterns including:

Stream-stream joins
Stream-table enrichments
Windowed aggregations
Event correlation
Real-time transformations
Stateful processing workflows

Condense also helps simplify production operations by providing:

Managed Kafka deployments
Built-in observability and monitoring
Stream processing lifecycle management
Automated scaling and infrastructure management
State store and resource management
Enterprise-grade security and governance

Conclusion

Frequently Asked Questions (FAQs)

Kafka stream joins and aggregations are stateful stream processing operations that allow applications to correlate events, enrich records with reference data, and compute real-time metrics. Joins combine data from multiple streams or tables, while aggregations continuously calculate values such as counts, sums, averages, and other business metrics as events arrive.

A stream-stream join correlates events from two KStreams that occur within a defined time window. A stream-table join enriches events from a KStream using the latest state stored in a KTable. Stream-stream joins require windowing, whereas stream-table joins perform lookups against the current state and do not require a time window.

State stores allow Kafka Streams to maintain information between events. They are used to support joins, aggregations, windowing operations, and real-time enrichment. Without state stores, Kafka Streams would not be able to perform stateful stream processing efficiently.

Kafka Streams uses event time and configurable grace periods to process late-arriving events. A grace period allows records to be accepted for a defined duration after a window has technically closed, helping improve accuracy when events arrive out of order.

Kafka Streams supports Tumbling Windows, Hopping Windows, and Session Windows. Tumbling windows are fixed and non-overlapping, hopping windows overlap and provide rolling calculations, while session windows group events based on periods of activity separated by inactivity gaps.

Condense provides no-code and low-code capabilities that allow teams to implement stream joins, aggregations, enrichments, and event correlation workflows without building every component from scratch. Developers can use visual workflows, reusable transforms, and prebuilt connectors to accelerate development.

Yes. Condense supports stateful stream processing patterns such as stream-stream joins, stream-table enrichments, windowed aggregations, event correlation, and real-time transformations. It also provides operational tooling to help manage these workloads at scale.

Condense enables teams to build real-time analytics pipelines using windowed aggregations, event processing workflows, and streaming transformations. These capabilities help organizations generate real-time insights from continuously arriving data.

Condense simplifies infrastructure management through managed Kafka deployments, built-in observability, monitoring, lifecycle management, and automated operational workflows. This allows engineering teams to focus more on business logic and less on infrastructure management.

Condense combines managed Kafka, low-code stream processing, prebuilt connectors, reusable transforms, and integrated development tools in a single platform. Organizations can build, deploy, monitor, and scale real-time applications faster while maintaining the flexibility required for complex enterprise workloads.

Dive Deeper with AI

Get exclusive blogs, articles and videos on data streaming, use cases and more delivered right in your inbox!

Ready to Switch to Condense and Simplify Real-Time Data Streaming? Get Started Now!

Switch to Condense for a fully managed, Kafka-native platform with built-in connectors, observability, and BYOC support. Simplify real-time streaming, cut costs, and deploy applications faster.