Skip to content

Conversation

@cloutiertyler
Copy link
Contributor

@cloutiertyler cloutiertyler commented Jan 23, 2026

Description of Changes

This PR translates all of our Python smoketests into Rust tests which can be run from cargo run

Motivation

The purpose of this fivefold:

  1. All developers on the team are familiar with Rust
  2. It simplifies our devops because we can drop Python as a dependency to run the tests
  3. You can now run all tests in the repo through the single cargo test interface
  4. Because we use the SpacetimeDbGuard and cargo test/cargo nextest we can easily parallelize the smoke tests
  5. The smoketests can now use machinery imported from SpacetimeDB crates (e.g. bsatn etc.)

IMPORTANT NOTE!

There are several ways to implement the smoke tests in Rust (none are great):

  1. A separate xtask specifically for the smoke tests
    • This doesn't solve the problem of the CLI tests which also use the guard crate
    • Idiosyncratic way to run the smoke tests as opposed to cargo test
    • Does NOT resolve the cargo within cargo problem because we still have to build the test modules with cargo
  2. A build.rs script in guard which first builds the executables as a compile step for compiling guard
    • Deadlocks on a cargo lock file conflict (Outer cargo compiles guard → runs build.rs, inner cargo tries to acquire the build directory lock, outer cargo holds the directory lock, deadlock)
    • If you fix the deadlock by using different target dirs, it still looks stuck on building guard because it's actually compiling all of spacetimedb-standalone and spacetimedb-cli.
    • Still technically runs cargo inside of cargo.
  3. Add spacetimedb-cli and spacetimedb-standalone as an artifact dependency of the guard crate
  4. Compile the executables at runtime during the tests themselves where the first test takes a lock while the executables are building using cargo within cargo
    • Makes the tests look like they're taking a long time when they're just waiting for the build to complete
    • Requires relatively complex locking machinery across binaries/tests/processes
  5. A two step solution where the developer has to build the binaries before calling the smoke tests
    • Very error prone

None of these are good. xtask is not bad, but doesn't enable us to run other integration tests in other crates (e.g. the CLI)

(3) is the correct solution and has the best user experience, but it requires nightly and I don't want to introduce that for all of our tests.

I have chosen to do a combination of (1) and (4). You will now run the smoketests with cargo smoketest. If you run cargo test --all (or use guard) without doing cargo smoketest it will fall back to (4) which compiles the executables at runtime. Running cargo build is the only way to ensure that the executables are not stale because of the internal fingerprint checking. Everything else is fragile not robust.

NOTE! There is no way to avoid cargo within cargo and have the smoke tests be run as cargo tests because the modules under test must be compiled with cargo.

API and ABI breaking changes

Note that this is a BREAKING CHANGE to cargo test --all. The smoketests are now part of cargo test --all unless you specifically exclude them.

Expected complexity level and risk

3, this is partially AI translated. We need to carefully review to ensure the semantics have not regressed.

Testing

Use OnceLock to build spacetimedb-cli and spacetimedb-standalone once
per test process, then run the pre-built binary directly instead of
using `cargo run`. This avoids repeated cargo overhead and ensures
consistent binary reuse across parallel tests.
Create `crates/smoketests/` to translate Python smoketests to Rust:

- Add `Smoketest` struct with builder pattern for test setup
- Implement CLI helpers: `spacetime_cmd()`, `call()`, `sql()`, `logs()`, etc.
- Translate `smoketests/tests/sql.py` → `tests/sql.rs`
- Translate `smoketests/tests/call.py` → `tests/call.rs`
- Reuse `ensure_binaries_built()` from guard crate (now public)

Also fix Windows process cleanup in `SpacetimeDbGuard`:
- Use `taskkill /F /T /PID` to kill entire process tree
- Prevents orphaned `spacetimedb-standalone.exe` processes
- Translate 4 more Python smoketests to Rust: auto_inc, describe,
  module_nested_op, panic
- Simplify the call API by removing the generic call<T: Serialize>
  method and renaming call_raw to call, since CLI args are strings
- Remove unused serde dependency
Translate additional Python smoketests to Rust:
- dml.rs: DML subscription tests
- filtering.rs: Unique/non-unique index filtering tests
- namespaces.rs: C# code generation namespace tests
- add_remove_index.rs: Index add/remove with subscription tests
- schedule_reducer.rs: Scheduled reducer tests

Infrastructure improvements:
- Add subscribe_background() and SubscriptionHandle for proper
  background subscription semantics matching Python tests
- Add spacetime_local() for commands that don't need --server flag
- Add timing instrumentation for debugging test performance
- Separate build and publish timing in lib.rs to identify bottlenecks
- Use --bin-path to skip redundant rebuild during publish
- Add DEVELOP.md explaining cargo-nextest for faster test runs

Timing breakdown per test:
- WASM build: ~12s (75%)
- Server publish: ~2s (12%)
- Server spawn: ~2s (12%)

cargo-nextest runs all test binaries in parallel, reducing total
runtime from ~265s to ~160s (40% faster).
@cloutiertyler cloutiertyler changed the title Tyler/translate smoketests Translate smoketests from Python to Rust Jan 23, 2026
Translate from Python smoketests:
- detect_wasm_bindgen.rs: Tests build rejects wasm_bindgen and getrandom (2 tests)
- default_module_clippy.rs: Tests default module passes clippy
- delete_database.rs: Tests deleting database stops scheduled reducers
- fail_initial_publish.rs: Tests failed publish doesn't corrupt control DB
- modules.rs: Tests module update lifecycle and breaking changes (2 tests)

Also adds spacetime_build() method to Smoketest for testing build failures.

Total: 16 test files translated, 32 tests
Clear CARGO* environment variables (except CARGO_HOME) when spawning
child cargo build processes. When running under `cargo test`, cargo
sets env vars like CARGO_ENCODED_RUSTFLAGS that differ from a normal
build, causing child cargo processes to think they need to recompile.

This reduces single-test runtime from ~45s to ~18s by avoiding
redundant rebuilds of spacetimedb-standalone and spacetimedb-cli.
Add test translations for:
- connect_disconnect_from_cli.rs - client connection callbacks
- domains.rs - database rename functionality
- client_connection_errors.rs - client_connected error handling
- confirmed_reads.rs - --confirmed flag for subscriptions/SQL
- create_project.rs - spacetime init command

Also fix subscription race condition by waiting for initial update
before returning from subscribe_background_*, matching Python behavior.
Translate tests for:
- views.rs: st_view_* system tables, namespace collisions, SQL views
- auto_migration.rs: schema changes, add table migration
@cloutiertyler cloutiertyler force-pushed the tyler/translate-smoketests branch from bda7652 to bea9069 Compare January 23, 2026 07:06
Add new_identity() method to support multi-identity tests.

Translate tests for:
- rls.rs: Row-level security filter tests
- energy.rs: Energy balance endpoint test
- permissions.rs: Private tables, lifecycle reducers, delete protection
Translate tests for:
- new_user_flow.rs: Basic publish/call/SQL workflow
- servers.rs: Server add/list/edit commands
Copy link
Collaborator

@jdetter jdetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like these changes so far, I'm excited to talk about this more later today 👍

fn test_add_then_remove_index() {
let mut test = Smoketest::builder().module_code(MODULE_CODE).autopublish(false).build();

let name = format!("test-db-{}", std::process::id());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have been an issue if we were to parallelize the smoketests but now we're running 1 spacetime server per test right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually kind of weird now that we're appending the process ID to the name - why not just use a static name like "test-db"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same questions. I do feel like the old random-string method was the most trivially "this should be fine" approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdetter one reason to not use a static name is running the tests against a remote server. Now you can run the tests multiple times without conflicting database names.

cloutiertyler and others added 12 commits January 23, 2026 11:42
- Add --config-path to spacetime_local() for test isolation
- Fix new_identity() to not pass server arg to logout (matches Python)
- Insert --server flag before -- separator in spacetime_cmd()
- Update servers.rs to use spacetime_local() for local-only commands
- Simplify test files by removing redundant publish_module() calls

All 56 smoketests now pass.
Translate smoketests/tests/quickstart.py to Rust. This test validates
that the quickstart documentation is correct by extracting code from
markdown docs and running it.

- Add parse_quickstart() to parse code blocks from markdown with CRLF handling
- Add have_pnpm() to check for pnpm availability
- Implement QuickstartTest with support for Rust, C#, and TypeScript servers
- Rust test passes; C#/TypeScript skip gracefully if dependencies unavailable
inserts.as_array().unwrap().iter().any(|r| r["name"] == "Cindy"),
"Expected Cindy in second update: {:?}",
second
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, the original tests also check things like the id of the entity, and that deletes is empty. unsure if there's an easy way to do a full structural comparison.

#[test]
fn test_spacetimedb_ns_csharp() {
let test = Smoketest::builder().module_code(MODULE_CODE).autopublish(false).build();
let _test = Smoketest::builder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why _test?

Signed-off-by: Tyler Cloutier <cloutiertyler@users.noreply.github.com>
pub fn person(ctx: &ViewContext) -> Option<Person> {
None
}
"#;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a precompiled module for this

pub fn person(ctx: &ViewContext) -> Option<ABC> {
None
}
"#;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a precompiled module for this

----+-------
2 | 2"#,
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're missing several tests from the python smoketests/tests/views.py

output.contains("testnet.spacetimedb.com"),
"Expected host in output: {}",
output
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the python version checks the host and protocol

testnet_re.is_match(&servers),
"Expected testnet in server list: {}",
servers
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the python version also tests for localhost

output.contains("fingerprint") || output.contains("Fingerprint"),
"Expected fingerprint message: {}",
output
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the python version tests fingerprinting of ip address and the presence/absence of -y as well. that could maybe be a separate test.

Signed-off-by: Zeke Foppa <196249+bfops@users.noreply.github.com>
invoked_count,
logs
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing the procedure tests from The Original Python

invoked_count, 1,
"Expected scheduled reducer to run exactly once, but it ran {} times. Logs: {:?}",
invoked_count, logs
);
Copy link
Collaborator

@bfops bfops Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Original Python also has:

        row_entry = {
            "prev": TIMESTAMP_ZERO,
            "scheduled_id": 2,
            "sched_at": {"Time": TIMESTAMP_ZERO},
        }
        # subscription should have 2 updates, first for row insert in scheduled table and second for row deletion.
        self.assertEqual(
            sub(),
            [
                {"scheduled_table": {"deletes": [], "inserts": [row_entry]}},
                {"scheduled_table": {"deletes": [row_entry], "inserts": []}},
            ],
        )

invoked_count,
logs
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Original Python also has this at the end:

        # scheduling repeated reducer again just to get 2nd subscription update.
        self.call("schedule_reducer")

        repeated_row_entry = {
            "prev": TIMESTAMP_ZERO,
            "scheduled_id": 1,
            "sched_at": {"Interval": {"__time_duration_micros__": 100000}},
        }
        row_entry = {
            "prev": TIMESTAMP_ZERO,
            "scheduled_id": 2,
            "sched_at": {"Time": TIMESTAMP_ZERO},
        }

        # subscription should have 2 updates and should not have any deletes
        self.assertEqual(
            sub(),
            [
                {"scheduled_table": {"deletes": [], "inserts": [repeated_row_entry]}},
                {"scheduled_table": {"deletes": [], "inserts": [row_entry]}},
            ],
        )

result.contains("yay!") && result.contains("hello"),
"Expected both 'yay!' and 'hello' in table, got: {}",
result
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original code did a background subscription and then checked it after the calls. I'm not sure whether that was important to what was being tested (i.e. if there was a specific worry about volatile_nonatomic_schedule_immediate wasn't updating subscriptions properly)

"Expected book ISBN in AFTER logs: {:?}",
logs
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're missing the AddTableColumns class equivalents from The Original Python

"Expected Robert in logs: {:?}",
logs
);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't print the log messages that The Original Python prints, unsure if intentional

test.publish_module_clear(false).unwrap();

// Add new data with updated schema
test.call("add_person", &["Husserl", "Student"]).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Original Python does this after this line:

        # If subscription, we should get 4 rows corresponding to 4 reducer calls (including before and after update)
        sub = sub();
        self.assertEqual(len(sub), 4)

#[test]
fn test_failures() {
if !have_psql() {
eprintln!("Skipping test_failures: psql not available");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this fail rather than quietly-ish skipping?

);

// Connection fails with invalid token - we can't easily test this without
// modifying the token, so skip this part
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case seems worth testing

(1 row)"#,
);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a test_sql_conn equivalent to The Original Python

#[test]
fn test_build_csharp_module() {
if !have_dotnet() {
eprintln!("Skipping test_build_csharp_module: dotnet 8.0+ not available");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this failed instead of skipping, forcing the user to pass a --skip-dotnet or similar like The Original Python did. That way, we can tell the difference between the "tests not failing because not run" and "tests not failing because behavior is correct" cases.

// CLI is pre-built by artifact dependencies during compilation
let cli_path = ensure_binaries_built();

// Install wasi-experimental workload
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python did this before this step:

            run_cmd("dotnet", "nuget", "locals", "all", "--clear", cwd=bindings, capture_stderr=True)

unsure if the change is intentional

.unwrap()
.strip_suffix(".wasm")
.unwrap()
.replace('_', "-");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps we filter out lines 87-98 into a function, and then call that function in the test?

///
/// This checks if the modules workspace target directory exists and contains
/// at least one WASM file.
pub fn precompiled_modules_available() -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?


- name: Fail if Python smoketests were modified
run: |
PYTHON_SMOKETEST_CHANGES=$(git diff --name-only origin/${{ github.base_ref }} HEAD -- 'smoketests/**.py')
Copy link
Collaborator

@bfops bfops Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up, I realized that this will start failing in this PR if anyone merges changes into master that touch the smoketests, even if you don't merge master into this PR. I don't know that that's a bad thing, just wanted to mention it before it's confusing.

if we wanted only-changed-in-this-PR logic, it would have to be:

MERGE_BASE=$(git merge-base origin/${{ github.base_ref }} HEAD)
PYTHON_SMOKETEST_CHANGES="$(git diff --name-only $MERGE_BASE HEAD -- 'smoketests/**.py')"

also we probably want to make this check required just before merging - I suggest leaving this comment unresolved until then.

for (key, _) in env::vars() {
let should_remove = (key.starts_with("CARGO") && key != "CARGO_HOME" && key != "CARGO_TARGET_DIR")
|| key.starts_with("RUST")
|| key == "__CARGO_FIX_YOLO";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

// Set remote server environment variable if specified
if let Some(ref server_url) = server {
cmd.env("SPACETIME_REMOTE_SERVER", server_url);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be moved out of the if entirely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants