Skip to content

Dov/antithesis poc upstream#36512

Draft
DAlperin wants to merge 30 commits into
MaterializeInc:mainfrom
DAlperin:dov/antithesis-poc-upstream
Draft

Dov/antithesis poc upstream#36512
DAlperin wants to merge 30 commits into
MaterializeInc:mainfrom
DAlperin:dov/antithesis-poc-upstream

Conversation

@DAlperin
Copy link
Copy Markdown
Member

Remove these sections if your commit already has a good description!

Motivation

Why does this change exist? Link to a GitHub issue, design doc, Slack
thread, or explain the problem in a sentence or two. A reviewer who has
no context should understand why after reading this section.

If this implements or addresses an existing issue, it's enough to link to that:
Closes
Fixes
etc.

Description

What does this PR actually do? Focus on the approach and any non-obvious
decisions. The diff shows the code --- use this space to explain what the
diff can't tell a reviewer.

Verification

How do you know this change is correct? Describe new or existing automated
tests, or manual steps you took.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Thank you for your submission! We really appreciate it. Like many source-available projects, we require that you sign our Contributor License Agreement (CLA) before we can accept your contribution.

You can sign the CLA by posting a comment with the message below.


I have read the Contributor License Agreement (CLA) and I hereby sign the CLA.


1 out of 2 committers have signed the CLA.
✅ (DAlperin)[https://github.com/DAlperin]
@mitchwagner-antithesis
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@DAlperin DAlperin force-pushed the dov/antithesis-poc-upstream branch from 754deec to d4373eb Compare May 11, 2026 19:21
DAlperin added 13 commits May 11, 2026 16:10
…older .env

mzbuild's _build_locked runs `git clean -ffdX <image_path>` before each
build, which wipes any gitignored file in the build context — including
the .env we generate. Two fixes:

1. publish:false on antithesis-config so the standard ci.test.build flow
   skips it entirely on regular nightly builds (where .env never exists).
   Only build-antithesis.sh / push-antithesis.py builds this image, and
   they write .env first.

2. Commit a placeholder .env so the file is tracked (survives git clean)
   and participates in mzbuild's fingerprint computation. build-antithesis.sh
   overwrites it with real registry refs before the build runs;
   fingerprint reflects the overwritten content per build.
Add 16 Antithesis properties for Kafka source ingestion (NONE + UPSERT
envelopes) to the scratchbook, plus the workload-side implementation of
upsert-key-reflects-latest-value.

Scratchbook additions:
  - sut-analysis Appendix A: kafka source pipeline detail
  - existing-assertions: enumerated SUT-side panic/assert sites that are
    candidates for Antithesis SDK instrumentation
  - property-catalog Category 7: 16 new Kafka/UPSERT properties
  - property-relationships clusters 7-10 plus cross-cluster connections
  - 16 per-property evidence files
  - evaluation/synthesis.md: four-lens review

Workload:
  - parallel_driver_upsert_latest_value.py: produces upserts+tombstones
    with deterministic randomness, requests a quiet period, polls
    mz_source_statistics for catchup, and asserts per-key value match
    (two always() assertions + one sometimes() liveness anchor).
  - helper_pg / helper_kafka / helper_quiet / helper_random /
    helper_source_stats / helper_upsert_source: shared utilities for
    subsequent Kafka source properties.
… catalog-recovery-consistency workload driver
…imeouts; remove dead upsert.rs (classic) antithesis asserts
DAlperin added 2 commits May 12, 2026 01:11
…are RocksDB lock

When I added clusterd2 in 4366c9e, both clusterds inherited the
DEFAULT_MZ_VOLUMES list, which uses a single named volume scratch:/scratch.
Docker named volumes are shared across containers by name, so the two
clusterds mounted the same /scratch and contended for RocksDB locks at
/scratch/storage/upsert/<id>/<worker>/LOCK.

This wedged clusterd1: it could never open its upsert RocksDB
("Resource temporarily unavailable" on the LOCK file), entered
Stalled health with "Failed to rehydrate state", broadcast
suspend-and-restart, and looped retry-fail-suspend-restart for the
entire run. The continuous restart loop drove the upsert
feedback-driven snapshot replay path in ways that produced visibly
wrong durable state for the source — exactly the
upsert-state-rehydrates-correctly assertions caught in the
2026-05-12 05:39 UTC Antithesis report.

Fix: give each clusterd its own per-instance named volume for /scratch.
The other volumes stay shared because they don't take exclusive locks.

Also patch export-compose.py to auto-declare any service-referenced
named volume at the top level — Composition only auto-declares
DEFAULT_MZ_VOLUMES, so without this the custom names broke
`docker compose config`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants