Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions examples/101_kafscale-dev-guide/01-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
<!--
Copyright 2025 Alexander Alten (novatechflow), NovaTechflow (novatechflow.com).
This project is supported and financed by Scalytics, Inc. (www.scalytics.io).

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# What is KafScale?

KafScale is a **Kafka-protocol compatible streaming platform** that separates compute from storage. Unlike traditional Kafka, KafScale uses **stateless brokers** and stores all data in **S3-compatible object storage**, making it simpler to operate and more cost-effective for many use cases.

KafScale is Kafka protocol compatible for producers and consumers
(see claim: **KS-COMP-001**).

Note: Kafka transactions are not supported
(see claim: **KS-LIMIT-001**).

## Key Characteristics

- **Kafka-Compatible**: Uses the standard Kafka wire protocol, so your existing Kafka clients work without modification (see claim: **KS-COMP-001**)
- **Stateless Brokers**: Brokers are ephemeral and can be scaled up or down without data movement (see claim: **KS-ARCH-001**)
- **S3-Backed Storage**: All log segments are stored in S3 (or S3-compatible storage like MinIO)
- **etcd Metadata**: Topic configuration and consumer offsets are stored in etcd
- **Cloud-Native**: Designed for Kubernetes, but can run anywhere with Docker

## How KafScale Differs from Traditional Kafka

| Aspect | Traditional Kafka | KafScale |
|--------|------------------|----------|
| **Storage** | Local broker disks | S3 object storage |
| **Broker State** | Stateful (stores data) | Stateless (data in S3) |
| **Scaling** | Complex rebalancing | Simple pod scaling |
| **Durability** | Replication across brokers | S3 durability (11 9's) |
| **Recovery** | Rebuild from replicas | Read from S3 |
| **Cost Model** | Provision for peak + replicas | Pay for actual storage used |

## When to Use KafScale

### ✅ Good Use Cases

- **Development and Testing**: Quick setup without complex infrastructure
- **Cost-Sensitive Workloads**: Reduce storage costs by using S3 instead of provisioned disks
- **Cloud-Native Deployments**: Leverage Kubernetes for scaling and orchestration
- **Replay-Heavy Workloads**: S3 storage makes long-term retention affordable
- **Event Sourcing**: Durable, immutable event logs with cost-effective storage

### ❌ Not Suitable For

- **Transactional Workloads**: KafScale does not support exactly-once semantics or transactions (see claim: **KS-LIMIT-001**)
- **Log Compaction**: Compacted topics are not supported
- **Ultra-Low Latency**: S3 storage adds latency compared to local disks (estimated 10-50ms additional overhead based on network and S3 response times)
- **High-Throughput Single Partition**: Traditional Kafka may be faster for very high throughput on a single partition

## Architecture Overview

Here's how KafScale works at a high level:

```
┌─────────────────────┐
│ Spring Boot App │
│ (Kafka Producer/ │
│ Consumer) │
└──────────┬──────────┘
│ Kafka Protocol (port 9092)
┌─────────────────────┐
│ KafScale Broker │
│ (Stateless) │
└────┬───────────┬────┘
│ │
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ etcd │ │ S3 │
│ Metadata│ │ Data │
└─────────┘ └─────────┘
```

### Components

1. **Your Application**: Uses standard Kafka client libraries (no changes needed!)
2. **KafScale Broker**: Handles Kafka protocol requests, buffers data in memory, writes to S3
3. **etcd**: Stores topic metadata, partition assignments, and consumer offsets
4. **S3 Storage**: Stores immutable log segments (MinIO for local development)

## What Makes KafScale Different

Traditional Kafka was designed when durable storage meant local disks. Brokers had to own both compute and data, requiring complex replication and rebalancing.

**Cloud object storage changes this entirely.**

With S3 providing durability, availability, and cost efficiency, brokers no longer need to be stateful. KafScale embraces this by making brokers simple protocol endpoints that read and write to S3.

This means:
- **No replication overhead**: S3 handles durability
- **No rebalancing**: Brokers are interchangeable
- **No disk management**: Storage is elastic and managed
- **Simpler operations**: Restart brokers without data movement

## Compatibility

KafScale implements the core Kafka APIs:

- ✅ **Produce** (API Key 0)
- ✅ **Fetch** (API Key 1)
- ✅ **Consumer Groups** (JoinGroup, SyncGroup, Heartbeat, etc.)
- ✅ **Offset Management** (OffsetCommit, OffsetFetch)
- ✅ **Topic Management** (CreateTopics, DeleteTopics)
- ✅ **Metadata** (topic/broker discovery)
- ✅ **Proto-compat**: Compatible with Kafka clients 2.x and 3.x (recommended: 3.4.x or older for best compatibility)

### Client Compatibility Notes

KafScale is compatible with standard Kafka clients, but stricter schema validation in newer clients (3.5+) may cause issues with certain requests (e.g., `ProduceResponse` schema mismatches).

**Recommendations:**
- **Java/Spring Boot**: Use Spring Boot 3.1.x (Kafka client 3.4.x) or ensure your client version is < 3.5.0 if you encounter protocol errors.
- **Idempotence**: Always set `enable.idempotence=false` in your producers. KafScale does not support the `InitProducerId` API or sequence number validation handling required for idempotent producers.
- **Transactions**: Config `isolation.level=read_uncommitted` (default) as transactions are not supported.

**Not supported** (by design):
- ❌ Transactions and exactly-once semantics (see claim: **KS-LIMIT-001**)
- ❌ Log compaction
- ❌ Kafka Streams applications that rely on transactions or exactly-once semantics (stateless Streams processing without these features may work)
- ❌ Flexible versions in some RPCs (may cause `recordErrors` serialization issues in newer clients)

For stream processing, use external engines like [Apache Flink](https://flink.apache.org), [Apache Spark Streaming](https://spark.apache.org/streaming/), or [Apache Wayang](https://wayang.apache.org).

## What You Should Know Now

Before moving to the next chapter, ensure you can answer these questions:

- [ ] What makes KafScale different from traditional Kafka? (Hint: stateless brokers, S3 storage)
- [ ] When should you use KafScale vs traditional Kafka?
- [ ] What are the key limitations? (Hint: transactions, compaction, latency)
- [ ] What does "Kafka protocol compatible" mean for your existing clients?
- [ ] What configuration changes are required for producers? (Hint: `enable.idempotence`)

If you're unsure about any of these, review the relevant sections above before continuing.

## Ready to Get Started?

Now that you understand what KafScale is and when to use it, let's get it running on your local machine!

**Next**: [Quick Start with Docker](02-quick-start.md) →
128 changes: 128 additions & 0 deletions examples/101_kafscale-dev-guide/02-quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
<!--
Copyright 2025 Alexander Alten (novatechflow), NovaTechflow (novatechflow.com).
This project is supported and financed by Scalytics, Inc. (www.scalytics.io).

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Quick Start: Local Demo + E10

This section guides you through the local demo using `make demo`. It runs the broker and console as local Go processes and uses a small MinIO helper container for S3-compatible storage. No Docker Compose is used.

## Overview

We'll run a complete KafScale demo stack:

- **Embedded etcd**: Metadata storage (started by the demo)
- **MinIO**: S3-compatible object storage (helper container)
- **KafScale Broker**: Stateless broker (local process)
- **KafScale Console**: UI for inspecting the cluster

## Step 1: Clone the Repository

```bash
git clone https://github.com/novatechflow/kafscale.git
cd kafscale
```

## Step 2: Build and Run the Local Demo

We provide a `Makefile` to automate building docker images and starting the cluster.

This command starts the local demo:

```bash
make demo
```

> **Note:** The demo starts local Go processes and a MinIO helper container. The first run may take a few minutes.

### System Check

Once running, you should see logs streaming. You can verify the MinIO helper container in a separate terminal:

```bash
docker ps
```

You should see:
- `kafscale-minio` (Port 9000, 9001)

## Step 3: Run the E10 Java Client Demo

**Estimated time**: 5-10 minutes (first-time build may take longer)

In another terminal:

```bash
cd examples/E10_java-kafka-client-demo
mvn clean package exec:java
```

What it does:
- Connects to `localhost:39092`
- Creates a topic `demo-topic-1`
- Produces 25 messages
- Consumes 5 messages
- Prints cluster metadata

**Verify success**:
You should see:
- ✅ "Sent message: key=key-0 value=message-0 partition=0 offset=0"
- ✅ "Received message: key=key-0 value=message-0 partition=0 offset=0"
- ✅ "Successfully consumed 5 messages."

If you see connection errors, check [Troubleshooting](05-troubleshooting.md).

## Step 4: Managing the Demo

### Stopping the Demo Cleanly
Press `Ctrl+C` in the terminal running `make demo` to stop the local broker and console. Wait for the test process to exit before starting another demo.

If you see ports still in use (39092/39093/39094), run:
```bash
make stop-containers
```
This stops the MinIO helper and frees broker ports.

### Accessing Interfaces

- **KafScale Console**: [http://localhost:48080/ui](http://localhost:48080/ui) (local demo uses port 48080)
- **MinIO Console**: [http://localhost:9001](http://localhost:9001) (User/Pass: `minioadmin`)
- **Prometheus Metrics**: [http://localhost:39093/metrics](http://localhost:39093/metrics)

> **Note**: The platform demo (Chapter 4) uses port 8080 for the console instead of 48080 to avoid conflicts with common development ports.

## Troubleshooting

If `make demo` fails, check:
1. **Ports**: Ensure ports `39092`, `39093`, `39094`, `48080`, `9000` are free.
2. **Docker Resource**: Ensure Docker has enough memory for the MinIO helper (recommended: 2GB+).

## What You Should Know Now

Before moving to the next chapter, verify you can:

- [ ] Start the local demo with `make demo`
- [ ] Run the E10 Java client demo successfully
- [ ] Verify messages were produced and consumed
- [ ] Access the KafScale Console UI
- [ ] Stop the demo cleanly with Ctrl+C and `make stop-containers`

**Checkpoint**: If E10 produced and consumed messages successfully, you're ready to proceed!

## Next Steps

Next, we'll configure a Spring Boot application and run the platform demo on kind (E20).

**Next**: [Spring Boot Configuration](03-spring-boot-configuration.md) →
Loading