xmtp_mls: Client::join_group_by_external_invite (atomic external-commit join)#3674
xmtp_mls: Client::join_group_by_external_invite (atomic external-commit join)#3674tylerhawkes wants to merge 1 commit into
Conversation
|
Claude finished @tylerhawkes's task —— View job Code Review
Review Date: 2026-06-10 06:49 UTC I've reviewed the implementation focusing on issues not already identified by macroscopeapp. Here are additional concerns: Security & Correctness1. The Recommendation: Add cryptographic verification of the GroupInfo before use, or confirm that 2. Missing Error Detail for Key Package Fetch Failures 🟡 Medium When Recommendation: Some(Err(e)) => {
tracing::warn!("Key package verification failed for installation {}: {}", hex::encode(id), e);
missing += 1;
}3.
4. The 5. Hardcoded The external commit builder uses Recommendation: Either use a stricter predicate or add a comment explaining why unconditional acceptance is safe in this context. Code Quality6. Missing Tracing for Critical Failure Paths 🔵 Observability The function has good tracing at the entry point (line 212), but critical error paths like Recommendation: Add 7. Test Coverage Gap
are strongly recommended before merge. Previously Identified Issues (Summary)For completeness, macroscopeapp already identified:
These should be addressed alongside the new issues above. |
ApprovabilityVerdict: Needs human review 4 blocking correctness issues found. This PR introduces a substantial new feature (753 lines) implementing QR-invite external commit join flow with complex MLS protocol interactions and network operations. Additionally, there are multiple unresolved review comments including 2 high-severity issues (publishing before validation, hardcoded sync cursor) that warrant human review. No code changes detected at You can customize Macroscope's approvability policy. Learn more. |
ffaaf90 to
c12079b
Compare
df41a55 to
dab038e
Compare
| // End-to-end test coverage for this entry point lives in the QR-invite | ||
| // integration test (T-1, libxmtp-integration-test). The join path |
There was a problem hiding this comment.
🟡 Medium client/external_invite.rs:749
persist_joined_group discards the Result from stored_group.store_or_ignore(&context.db()) with let _ = , so database errors are silently swallowed and the function returns Ok(()) even when persistence fails. This leaves MLS state persisted (via openmls finalize) but the libxmtp StoredGroup row missing, causing the group to be invisible to libxmtp APIs. Consider propagating the error by returning store_or_ignore(...)? or handling the Err case explicitly.
- let _ = stored_group.store_or_ignore(&context.db());
+ stored_group.store_or_ignore(&context.db()).map_err(|e| {
+ ClientError::Generic(format!(
+ "join_group_by_external_invite: store_or_ignore failed: {e}"
+ ))
+ })?;
Ok(())🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/client/external_invite.rs around lines 749-750:
`persist_joined_group` discards the `Result` from `stored_group.store_or_ignore(&context.db())` with `let _ = `, so database errors are silently swallowed and the function returns `Ok(())` even when persistence fails. This leaves MLS state persisted (via openmls finalize) but the libxmtp `StoredGroup` row missing, causing the group to be invisible to libxmtp APIs. Consider propagating the error by returning `store_or_ignore(...)?` or handling the `Err` case explicitly.
Evidence trail:
crates/xmtp_mls/src/client/external_invite.rs lines 728-751 (REVIEWED_COMMIT): `persist_joined_group` function discards `Result` on line 749 with `let _ =` and returns `Ok(())` on line 750. Caller at line 485 uses `?` operator expecting error propagation.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## tyler/qr-invite-l10-create-invite #3674 +/- ##
====================================================================
Coverage ? 83.82%
====================================================================
Files ? 411
Lines ? 60484
Branches ? 0
====================================================================
Hits ? 50700
Misses ? 9784
Partials ? 0 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
dab038e to
f0d63d4
Compare
5ebf785 to
e8978a2
Compare
f0d63d4 to
a6ec61d
Compare
e8ceed3 to
57e7e30
Compare
a6ec61d to
b5b9a53
Compare
| let welcome_bytes = welcome_msg | ||
| .tls_serialize_detached() | ||
| .map_err(JoinByExternalInviteError::GroupInfoTlsDecode)?; | ||
| let welcome_metadata_bytes = WelcomeMetadata { message_cursor: 0 }.encode_to_vec(); |
There was a problem hiding this comment.
🟠 High client/external_invite.rs:527
WelcomeMetadata { message_cursor: 0 } is hard-coded for HPKE-wrapped Welcomes to other installations, but process_new_welcome uses message_cursor to set the recipient's sync cursor. Each co-resident installation will store the membership-change message at cursor 0 instead of the actual commit position, causing subsequent syncs to replay history or mis-order messages. Consider passing the real cursor value from the external commit's publication response.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 527:
`WelcomeMetadata { message_cursor: 0 }` is hard-coded for HPKE-wrapped Welcomes to other installations, but `process_new_welcome` uses `message_cursor` to set the recipient's sync cursor. Each co-resident installation will store the membership-change message at cursor 0 instead of the actual commit position, causing subsequent syncs to replay history or mis-order messages. Consider passing the real cursor value from the external commit's publication response.
| } | ||
| if policy.external_group_id.as_slice() != payload_external_group_id.as_slice() { | ||
| return Err(JoinByExternalInviteError::ExternalGroupIdMismatch.into()); | ||
| } |
There was a problem hiding this comment.
🟢 Low client/external_invite.rs:582
compute_refreshed_blob_expiry uses now_ns captured at the start of the function, so expire_in_ns policies produce a TTL shortened by the full join latency. When that latency exceeds the relative expiry window, the returned refreshed_encrypted_group_info blob is already expired. Consider capturing a fresh now_ns just before the expiry computation so the TTL reflects the actual remaining validity window.
- let refreshed_expires_at_ns = compute_refreshed_blob_expiry(&policy, now);🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 582:
`compute_refreshed_blob_expiry` uses `now_ns` captured at the start of the function, so `expire_in_ns` policies produce a TTL shortened by the full join latency. When that latency exceeds the relative expiry window, the returned `refreshed_encrypted_group_info` blob is already expired. Consider capturing a fresh `now_ns` just before the expiry computation so the TTL reflects the actual remaining validity window.
| let (mls_group, bundle) = builder | ||
| .build(provider.rand(), provider.crypto(), signer, |_| true) | ||
| .map_err(|e| { | ||
| JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}")) | ||
| })? | ||
| .finalize(&provider) | ||
| .map_err(|e| { | ||
| JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("finalize: {e}")) | ||
| })?; |
There was a problem hiding this comment.
🟡 Medium client/external_invite.rs:461
If send_group_messages fails, the method returns PublishError after builder.finalize() has already persisted the new MLS group state locally and persist_joined_group has created the libxmtp wrapper row. The client is left thinking it joined the group at the new epoch even though the external commit never reached XMTP delivery, so the operation is not safely retryable and the local state diverges from the rest of the group. Consider reordering so that publishing happens before local persistence, or implementing cleanup/rollback logic on publish failure.
- let (mls_group, bundle) = builder
- .build(provider.rand(), provider.crypto(), signer, |_| true)
- .map_err(|e| {
- JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
- })?
- .finalize(&provider)
- .map_err(|e| {
- JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("finalize: {e}"))
- })?;
+ let (commit, bundle) = builder
+ .build(provider.rand(), provider.crypto(), signer, |_| true)
+ .map_err(|e| {
+ JoinByExternalInviteError::ExternalCommitBuilderFailed(format!("build: {e}"))
+ })?;🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around lines 461-469:
If `send_group_messages` fails, the method returns `PublishError` after `builder.finalize()` has already persisted the new MLS group state locally and `persist_joined_group` has created the libxmtp wrapper row. The client is left thinking it joined the group at the new epoch even though the external commit never reached XMTP delivery, so the operation is not safely retryable and the local state diverges from the rest of the group. Consider reordering so that publishing happens before local persistence, or implementing cleanup/rollback logic on publish failure.
| created_at_ns, | ||
| ); | ||
|
|
||
| // 8. Publish the PublicMessage external commit to XMTP delivery. |
There was a problem hiding this comment.
🟠 High client/external_invite.rs:493
join_group_by_external_invite publishes the external commit and device welcomes (lines 503-561) before verifying that the joined group's EXTERNAL_COMMIT_POLICY.external_group_id matches the QR payload (lines 580-582). If the invite service returns a GroupInfo for the wrong group encrypted under the same symmetric key, the client joins and publishes to that wrong group before ExternalGroupIdMismatch is returned. Move the external_group_id validation (steps 10a-10b) before the publish steps (steps 8-9) so the mismatch is caught before any network side effects occur.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 493:
`join_group_by_external_invite` publishes the external commit and device welcomes (lines 503-561) before verifying that the joined group's `EXTERNAL_COMMIT_POLICY.external_group_id` matches the QR payload (lines 580-582). If the invite service returns a `GroupInfo` for the wrong group encrypted under the same symmetric key, the client joins and publishes to that wrong group before `ExternalGroupIdMismatch` is returned. Move the external_group_id validation (steps 10a-10b) before the publish steps (steps 8-9) so the mismatch is caught before any network side effects occur.
57e7e30 to
88884e2
Compare
88884e2 to
47d1f52
Compare
b5b9a53 to
3056397
Compare
| PostCommitGroupInfoEncrypt(#[source] EncryptedGroupInfoError), | ||
| } | ||
|
|
||
| impl From<JoinByExternalInviteError> for ClientError { |
There was a problem hiding this comment.
🟢 Low client/external_invite.rs:170
The From<JoinByExternalInviteError> implementation discards all structured error variants and collapses them into ClientError::Generic, so bindings/node and bindings/wasm format errors via error_code() always report only Generic instead of the specific failure type (malformed invite, stale QR, missing installations, etc.). SDK callers lose the ability to distinguish these failure modes programmatically. Consider implementing a non-generic ClientError variant that preserves the structured error information.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/client/external_invite.rs around line 170:
The `From<JoinByExternalInviteError>` implementation discards all structured error variants and collapses them into `ClientError::Generic`, so `bindings/node` and `bindings/wasm` format errors via `error_code()` always report only `Generic` instead of the specific failure type (malformed invite, stale QR, missing installations, etc.). SDK callers lose the ability to distinguish these failure modes programmatically. Consider implementing a non-generic `ClientError` variant that preserves the structured error information.
…ves) + adapt invite::payload Regenerates xmtp_proto from xmtp/proto#336 (typed SymmetricKey / GroupStateHash / ServicePointer newtypes, ExternalCommitPolicyV1 max_uses + refresh_pointers, EncryptedGroupInfoBlobV1 AAD + effective-expiry semantics, GroupMembershipEntry.V1 admitted_via_external_group_id). Consuming changes kept to what main already contains: - invite::payload rewritten for the typed shape: symmetric_key is a SymmetricKey submessage (MissingSymmetricKey + material-length checks), service_pointer is an optional ServicePointer (absent = application-resolved; present-but-empty oneof fails closed; https_url parsed and scheme-checked via url). New https_service_pointer / opaque_service_pointer / validate_service_pointer helpers. - GroupMembershipEntry.V1 construction sites gain the new admitted_via_external_group_id field, empty everywhere: the migrator's synthesized entries are all Welcome/legacy members (absent is permanently correct), and the membership-update rewrite path documents that tag preservation MUST land together with the validator's write-once enforcement in the external-commit stack — nothing on main can set the tag yet, so empty is correct today. The rest of the QR-invite stack (#3671 -> #3666 -> #3667 -> #3668 -> #3673 -> #3674) rebases onto this and picks up the new fields layer by layer.
…ves) + adapt invite::payload Regenerates xmtp_proto from xmtp/proto#336 (typed SymmetricKey / GroupStateHash / ServicePointer newtypes, ExternalCommitPolicyV1 max_uses + refresh_pointers, EncryptedGroupInfoBlobV1 AAD + effective-expiry semantics, GroupMembershipEntry.V1 admitted_via_external_group_id). Consuming changes kept to what main already contains: - invite::payload rewritten for the typed shape: symmetric_key is a SymmetricKey submessage (MissingSymmetricKey + material-length checks), service_pointer is an optional ServicePointer (absent = application-resolved; present-but-empty oneof fails closed; https_url parsed and scheme-checked via url). New https_service_pointer / opaque_service_pointer / validate_service_pointer helpers. - GroupMembershipEntry.V1 construction sites gain the new admitted_via_external_group_id field, empty everywhere: the migrator's synthesized entries are all Welcome/legacy members (absent is permanently correct), and the membership-update rewrite path documents that tag preservation MUST land together with the validator's write-once enforcement in the external-commit stack — nothing on main can set the tag yet, so empty is correct today. The rest of the QR-invite stack (#3671 -> #3666 -> #3667 -> #3668 -> #3673 -> #3674) rebases onto this and picks up the new fields layer by layer.
Summary
Adds the join half of the QR-invite flow.
Client::join_group_by_external_invite(invite_payload_bytes, encrypted_blob_bytes) -> JoinByExternalInviteOutput<Context>group_id_hashmatchessha256(decrypted.group_id)(defense-in-depth against a malicious service swapping ciphertexts).MlsGroupand the serialized refreshed blob.Stack
Test plan
Note
Add
Client::join_group_by_external_invitefor atomic external-commit group joiningclient/external_invite.rsmodule implementingClient::join_group_by_external_invite, which performs an atomic MLS external commit to join a group from a QR-style invite payload.ExternalInvitePayload, parsesGroupInfo, fetches key packages for co-resident installations, and builds an external commit with inlineAddproposals and anAppDataUpdaterecording the new member.StoredGrouprow withPendingmembership, publishes the commit with an HMAC binding, sends HPKE-wrapped Welcome messages to other installations, validates post-join policy, and returns a re-encryptedGroupInfoblob for upload back to the invite service.JoinByExternalInviteErrorfor structured error reporting across all failure modes in the join flow.📊 Macroscope summarized 3056397. 1 file reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted
🗂️ Filtered Issues
No issues evaluated.