Skip to content

feat: add Avro and Protobuf format support to Kafka source (ETL-1197)#70

Merged
PabloPardoGarcia merged 1 commit into
pablo/etl-1186-add-ee-client-to-sdkfrom
pablo/etl-1197-sdk-add-support-for-avro-and-protobuf
Jun 16, 2026
Merged

feat: add Avro and Protobuf format support to Kafka source (ETL-1197)#70
PabloPardoGarcia merged 1 commit into
pablo/etl-1186-add-ee-client-to-sdkfrom
pablo/etl-1197-sdk-add-support-for-avro-and-protobuf

Conversation

@PabloPardoGarcia

Copy link
Copy Markdown
Member

What

Implements ETL-1197: adds format (json/avro/protobuf) and a unified schema to the Kafka source, matching the agreed backend contract.

Stacked on #68 (ETL-1186) — base is pablo/etl-1186-add-ee-client-to-sdk, which already includes the merged DLQ reprocessing work (#69). The diff here is just the avro/protobuf model.

# json (top-level schema_fields, or schema.fields)
{"type": "kafka", ..., "format": "json", "schema_fields": [{"name": "id", "type": "string"}]}
# avro
{"type": "kafka", ..., "format": "avro", "schema": {"avsc": {"type": "record", "name": "Event", "fields": [...]}}}
# protobuf
{"type": "kafka", ..., "format": "protobuf", "schema": {"proto": "syntax = \"proto3\"; ...", "message": "Event"}}

Model

  • KafkaSchema groups all schema config: fields (json), avsc (Avro record), proto + message (protobuf), and read-only parsed_fields (returned by the backend on GET).
  • AvroSchema enforces the structure GlassFlow needs to map to ClickHouse: a record with a name and non-empty fields; extra Avro keys are preserved. Full Avro correctness is left to the backend.
  • Replaces the speculative tagged-format foundation (formats.py + the format registry) from the EE-client branch; the source-type registry is unchanged.

Edition compatibility

Input accepted Serialized as
json top-level schema_fields or schema.fields top-level schema_fields (works on OSS + EE)
avro schema.avsc schema.avsc
protobuf schema.proto + schema.message schema.proto + schema.message

parsed_fields is parsed from GET responses and exposed as source.source_schema.parsed_fields, but is read-only: not used in any SDK logic and never sent back on create/edit. A schema_fields read property keeps the dedup/join validators and existing callers working.

Error detail

APIError now carries the structured details, and the create/edit path composes details.error (e.g. an Avro/Protobuf schema compilation error) into PipelineInvalidConfigurationError instead of the generic message.

Validation

avro requires schema.avsc; protobuf requires schema.proto + schema.message; avsc/proto/message with a non-avro/protobuf format is rejected.

Testing

255 passed, ruff clean. Unit-tested across all input forms, round-trips, parsed_fields handling, and the error-detail surfacing.

Not yet live-tested: the unified schema contract isn't in the backend main yet, so end-to-end verification against staging is pending the backend release.

🤖 Generated with Claude Code

…TL-1197)

Add `format` (json/avro/protobuf) and a unified `schema` to the Kafka source,
matching the agreed backend contract.

- KafkaSchema groups all schema config: `fields` (json), `avsc` (Avro record),
  `proto` + `message` (protobuf), and read-only `parsed_fields` (returned by the
  backend on GET, never sent back and not used in SDK logic). AvroSchema enforces
  the structure GlassFlow needs to map to ClickHouse: a `record` with a `name`
  and non-empty `fields`; extra Avro keys are preserved. Full Avro correctness is
  left to the backend.
- Compatibility across editions: input accepts both a top-level `schema_fields`
  (Open Source, and Enterprise for POST compat) and the unified `schema` object
  (Enterprise). On output, json serializes to top-level `schema_fields` (works on
  both editions) and avro/protobuf to the `schema` object. A `schema_fields` read
  property keeps the dedup/join validators and existing callers working.
- Replace the speculative tagged-format foundation (formats.py + the format
  registry); the source-type registry is unchanged.

Surface backend validation detail: APIError now carries the structured
`details`, and the create/edit path composes details.error (e.g. a proto/avro
compilation error) into PipelineInvalidConfigurationError.

avro/protobuf are Enterprise features enforced by the backend; the SDK models
them on the shared OSS source so OSS clients can also round-trip such configs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@PabloPardoGarcia PabloPardoGarcia merged commit 5c69272 into pablo/etl-1186-add-ee-client-to-sdk Jun 16, 2026
@PabloPardoGarcia PabloPardoGarcia deleted the pablo/etl-1197-sdk-add-support-for-avro-and-protobuf branch June 16, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant