feat: add POST /admin-api/tee/fleet-join endpoint#2091
Conversation
Fleet sidecar calls this on localhost to trigger the full TEE HA join flow: 1. Creates a new identity for the fleet node 2. Subscribes to the group gossip topic 3. Generates TDX attestation with group_id as nonce and SHA256(public_key) bound to report_data 4. Broadcasts TeeAttestationAnnounce on the group topic 5. Returns status with public_key and attestation info Also adds NodeClient::publish_on_group() for publishing arbitrary payloads on group gossip topics. Made-with: Cursor
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 44% | Review time: 138.4s
🟡 2 warnings, 💡 3 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-48692531
|
|
||
| if let Err(err) = state.node_client.subscribe_group(group_id_bytes).await { | ||
| error!(error=?err, "Failed to subscribe to group topic"); | ||
| return ApiError { |
There was a problem hiding this comment.
🟡 Resource leak: subscription not cleaned up on subsequent failures
If generate_attestation or the broadcast fails after subscribe_group succeeds, the group subscription is never cleaned up, leaking network resources on repeated failed calls.
Suggested fix:
Add cleanup logic (unsubscribe_group) in error paths after line 62, or use a guard/RAII pattern to ensure cleanup on early returns.
| .publish_on_group(group_id_bytes, payload) | ||
| .await | ||
| { | ||
| Ok(_) => { |
There was a problem hiding this comment.
🟡 Mock attestation status exposed in API response
Returning is_mock in the response reveals whether the TEE is running in mock/test mode, which helps attackers identify non-production or weakly protected deployments.
Suggested fix:
Remove `is_mock` from the API response; if needed for debugging, only include it when a debug flag is explicitly enabled.
| error!(error=?err, "Failed to create identity for fleet join"); | ||
| return ApiError { | ||
| status_code: StatusCode::INTERNAL_SERVER_ERROR, | ||
| message: format!("Failed to create identity: {err}"), |
There was a problem hiding this comment.
💡 Internal error details exposed to clients
Error messages like format!("Failed to create identity: {err}") leak internal implementation details that could aid attackers in understanding the system.
Suggested fix:
Return generic error messages to clients while logging the detailed error internally (already done via `error!` macro).
| } | ||
| }; | ||
|
|
||
| let our_public_key = match state.ctx_client.new_identity() { |
There was a problem hiding this comment.
💡 Orphaned identity on partial failure
If any step after new_identity() fails, the created identity is never used; if identities are persisted, this accumulates orphaned entries over time.
Suggested fix:
Consider deferring identity creation until after subscription succeeds, or document that orphaned identities are acceptable/garbage-collected.
| use crate::admin::handlers::validation::ValidatedJson; | ||
| use crate::admin::service::{ApiError, ApiResponse}; | ||
| use crate::AdminState; | ||
|
|
There was a problem hiding this comment.
💡 No visible rate limiting for resource-intensive endpoint
This endpoint creates identities, generates TDX attestations, and broadcasts to the network—all potentially expensive operations with no visible rate limiting to prevent abuse.
Suggested fix:
Consider adding rate limiting middleware for this endpoint to prevent resource exhaustion attacks.
| pub group_id: String, | ||
| } | ||
|
|
||
| impl Validate for FleetJoinRequest { |
There was a problem hiding this comment.
📝 Nit: hex::decode called twice for the same input
Validation calls hex::decode to check validity, then the handler decodes again; the allocation is redundant.
Suggested fix:
Consider storing the decoded bytes in a validated wrapper type, or accept the duplication as acceptable for clarity.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 47% | Review time: 85.4s
🟡 2 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-38e0ad35
| } | ||
| }; | ||
|
|
||
| info!(%our_public_key, "Created identity for fleet TEE node"); |
There was a problem hiding this comment.
🟡 Resource leak: subscription not cleaned up on subsequent failures
If subscribe_group succeeds but generate_attestation, serialization, or publish_on_group fails, the group subscription remains active without cleanup, accumulating orphan subscriptions over repeated failed attempts.
Suggested fix:
Wrap operations in a transaction-like pattern: if any step after subscribe_group fails, call unsubscribe_group before returning the error.
| } | ||
| .into_response(); | ||
| } | ||
| }; |
There was a problem hiding this comment.
🟡 Identity persisted even when join ultimately fails
A new identity is created via new_identity() before subscribing and broadcasting; if subsequent steps fail, this identity may persist as orphaned state that cannot be retried or cleaned up.
Suggested fix:
Consider delaying identity creation until after subscription succeeds, or implement cleanup logic for the identity on failure.
| .publish_on_group(group_id_bytes, payload) | ||
| .await | ||
| { | ||
| Ok(_) => { |
There was a problem hiding this comment.
💡 Response leaks attestation mode (mock vs production)
Returning is_mock in the API response reveals whether the system runs with mock attestations (development) or real TDX hardware, which could inform attackers about the deployment environment.
Suggested fix:
Consider removing `is_mock` from the response payload, or restrict it to debug builds only.
| Err(err) => { | ||
| error!(error=?err, "Failed to create identity for fleet join"); | ||
| return ApiError { | ||
| status_code: StatusCode::INTERNAL_SERVER_ERROR, |
There was a problem hiding this comment.
💡 Verbose error messages may leak internal details
Error responses include raw internal error strings (e.g., format!("Failed to create identity: {err}")) which could expose implementation details to attackers.
Suggested fix:
Return generic error messages to clients and log detailed errors server-side only.
99b06df to
95c14ad
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 52% | Review time: 148.4s
🟡 2 warnings. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-02f02a44
| }; | ||
|
|
||
| info!(%our_public_key, "Created identity for fleet TEE node"); | ||
|
|
There was a problem hiding this comment.
🟡 Subscription not cleaned up on subsequent failures
If any operation fails after subscribe_group succeeds (attestation generation, serialization, or publish), the node remains subscribed to the group topic but the join flow is incomplete, leaving orphaned state.
Suggested fix:
Call `unsubscribe_group` in the error paths after line 60, or restructure to only subscribe after all other operations succeed (attestation/serialization can be done before subscribing).
| .into_response(); | ||
| } | ||
| }; | ||
|
|
There was a problem hiding this comment.
🟡 Mock attestations accepted unlike /tee/attest endpoint
This endpoint accepts and broadcasts mock attestations (is_mock=true), while /tee/attest rejects them with NOT_IMPLEMENTED. Broadcasting mock attestations to the network could allow spoofed TEE nodes in non-TDX environments.
Suggested fix:
Add the same mock rejection logic as attest.rs: if attestation.is_mock { return ApiError with NOT_IMPLEMENTED }
E2E Rust Apps FailedOne or more E2E workflows (e2e-kv-store, xcall-example) failed after retries. Please check the workflow logs for more details. |
Made-with: Cursor
95c14ad to
fa08590
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 44% | Review time: 178.1s
🟡 3 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-436597b5
| }; | ||
|
|
||
| info!(%our_public_key, "Created identity for fleet TEE node"); | ||
|
|
There was a problem hiding this comment.
🟡 Resource leak: subscription not cleaned up on failure
If attestation generation, serialization, or broadcast fails after subscribing to the group topic (line 51), the subscription is never cleaned up.
Suggested fix:
Wrap the post-subscription logic in a pattern that calls `unsubscribe_group` on any error path, similar to `broadcast_specialized_node_invite` cleanup pattern in client.rs.
| use crate::admin::service::{ApiError, ApiResponse}; | ||
| use crate::AdminState; | ||
|
|
||
| pub async fn handler( |
There was a problem hiding this comment.
🟡 No authorization check for group membership
The handler allows any caller to join any group_id without verifying the caller is authorized to join that specific group, potentially allowing unauthorized nodes to inject themselves into TEE groups.
Suggested fix:
Add authorization check to verify the requesting node/sidecar is permitted to join the specified group_id before creating identity and broadcasting attestation.
| Ok(()) | ||
| } | ||
|
|
||
| pub async fn publish_on_group(&self, group_id: [u8; 32], payload: Vec<u8>) -> eyre::Result<()> { |
There was a problem hiding this comment.
🟡 Unrestricted group topic publishing
The new publish_on_group method allows publishing arbitrary payloads to any group topic without validating the caller has membership or permission for that group.
Suggested fix:
Add a check that the caller is a member of the group before allowing publish, or document that callers must perform this check.
| .into_response(); | ||
| } | ||
| }; | ||
|
|
There was a problem hiding this comment.
💡 Identity created but not cleaned up on subsequent failures
A new identity is persisted via new_identity() but if any later step fails (subscribe, attestation, broadcast), the orphan identity remains stored.
Suggested fix:
Consider deferring identity creation until after other validations succeed, or document that orphan identities are acceptable and will be garbage collected.
|
|
||
| match state | ||
| .node_client | ||
| .publish_on_group(group_id_bytes, payload) |
There was a problem hiding this comment.
💡 Information disclosure via is_mock field in response
Exposing is_mock in the API response reveals whether the system is running with real TDX attestation or mock mode, which could help attackers assess the security posture.
Suggested fix:
Consider removing `is_mock` from the response or only including it in debug builds.
Made-with: Cursor
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 33% | Review time: 151.7s
🔴 1 critical, 🟡 3 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-e9553ad6
| use crate::admin::service::{ApiError, ApiResponse}; | ||
| use crate::AdminState; | ||
|
|
||
| pub async fn handler( |
There was a problem hiding this comment.
🔴 Unauthenticated endpoint creates identities and broadcasts to P2P network
The /admin-api/tee/fleet-join endpoint is registered under public_routes (no session auth layer), allowing any network-reachable client to create identities and broadcast attestation announcements, potentially enabling DoS or network spam attacks.
Suggested fix:
Move `fleet_join` route to `protected_routes` in service.rs, or add explicit authentication check in the handler. If localhost-only access is intended, add middleware to verify requests originate from 127.0.0.1.
| info!(%our_public_key, "Created identity for fleet TEE node"); | ||
|
|
||
| if let Err(err) = state.node_client.subscribe_group(group_id_bytes).await { | ||
| error!(error=?err, "Failed to subscribe to group topic"); |
There was a problem hiding this comment.
🟡 Missing cleanup on subsequent failures after subscribe_group
If generate_attestation or publish_on_group fails after subscribe_group succeeds, the node remains subscribed to the group topic but returns an error—leaving dangling subscription state.
Suggested fix:
Unsubscribe from the group topic in error paths after line 53, or wrap in a scope that cleans up on failure (e.g., using a guard or explicit unsubscribe calls in each error branch).
| } | ||
| }; | ||
|
|
||
| let our_public_key = match state.ctx_client.new_identity() { |
There was a problem hiding this comment.
🟡 No rate limiting on identity creation
Each request to this endpoint creates a new cryptographic identity via new_identity(). Without rate limiting, an attacker can exhaust storage or memory by repeatedly calling this endpoint.
Suggested fix:
Add rate limiting middleware (e.g., tower-governor or custom limiter) or track pending join attempts to prevent resource exhaustion.
| }; | ||
|
|
||
| let our_public_key = match state.ctx_client.new_identity() { | ||
| Ok(pk) => pk, |
There was a problem hiding this comment.
🟡 Created identity not cleaned up on failure
A new identity is created at line 39-48 and persisted, but if any subsequent step fails (subscribe, attestation, publish), the orphaned identity remains stored with no cleanup.
Suggested fix:
Consider either deferring identity creation until all preconditions succeed, or implementing rollback logic to remove the identity on failure.
| ValidatedJson(req): ValidatedJson<FleetJoinRequest>, | ||
| ) -> impl IntoResponse { | ||
| info!(group_id=%req.group_id, "Fleet join: subscribing to group and broadcasting attestation"); | ||
|
|
There was a problem hiding this comment.
💡 Logging user-controlled group_id without sanitization
The group_id from the request is logged directly; while hex-validated, extremely long invalid inputs before validation could affect log integrity.
Suggested fix:
Move the info! log after the hex validation succeeds, or truncate the logged value to a safe length.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
- Wire tee::protected_service() into protected_routes (endpoint was unreachable) - Replace hardcoded 2s sleep with mesh_peer_count polling in publish_on_group - Reorder operations: generate attestation + serialize before subscribing - Reject mock attestations (match /tee/attest behavior) - Use random nonce instead of static group_id for freshness binding - Use generic error messages to avoid leaking internals - Remove is_mock from API response - Unsubscribe from group on publish failure Made-with: Cursor
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 54% | Review time: 136.8s
🟡 1 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-f06c7f17
| // Create identity first (before subscribe, so we can clean up if needed) | ||
| let our_public_key = match state.ctx_client.new_identity() { | ||
| Ok(pk) => pk, | ||
| Err(err) => { |
There was a problem hiding this comment.
🟡 Identity resource leak on error paths
If attestation generation, payload serialization, or group subscription fails after new_identity() succeeds, the created identity is never cleaned up.
Suggested fix:
Track created identity and clean it up in error paths, or use a guard pattern that removes the identity on drop unless explicitly committed.
| Ok(()) | ||
| } | ||
|
|
||
| pub async fn publish_on_group(&self, group_id: [u8; 32], payload: Vec<u8>) -> eyre::Result<()> { |
There was a problem hiding this comment.
💡 No cancellation or backpressure for 10-second polling wait
The polling loop can block the async handler for up to 10 seconds with no cancellation mechanism, which may cause HTTP client timeouts on slow mesh peer discovery.
Suggested fix:
Consider accepting an optional timeout parameter or using `tokio::select!` with a cancellation token for better responsiveness.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 2 agents | Quality score: 51% | Review time: 125.9s
🟡 1 warnings, 💡 3 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-20eff9da
| // Create identity first (before subscribe, so we can clean up if needed) | ||
| let our_public_key = match state.ctx_client.new_identity() { | ||
| Ok(pk) => pk, | ||
| Err(err) => { |
There was a problem hiding this comment.
🟡 Identity not cleaned up on subsequent failures
If attestation generation, serialization, subscribe, or publish fails after new_identity() succeeds, the created identity is leaked and never removed.
Suggested fix:
Add cleanup logic to delete the identity on error paths, or document that orphaned identities are acceptable and garbage-collected elsewhere.
| status_code: StatusCode::INTERNAL_SERVER_ERROR, | ||
| message: "Failed to subscribe to group".to_owned(), | ||
| } | ||
| .into_response(); |
There was a problem hiding this comment.
💡 Publish proceeds with zero mesh peers after timeout
After the 10-second wait, publishing proceeds even with no mesh peers, meaning the TeeAttestationAnnounce may be silently lost while returning success to the caller.
Suggested fix:
Consider returning an error or a warning status in the response when no mesh peers were available, so the caller knows the announcement may not have been received.
|
|
||
| const MAX_WAIT: std::time::Duration = std::time::Duration::from_secs(10); | ||
| const POLL_INTERVAL: std::time::Duration = std::time::Duration::from_millis(500); | ||
|
|
There was a problem hiding this comment.
💡 Repeated TopicHash clone in polling loop
topic.clone() allocates on each loop iteration (up to 20 times); consider using a reference if the API allows, or accepting the overhead given the infrequent calls.
Suggested fix:
If mesh_peer_count can accept &TopicHash, change to borrow; otherwise document this is acceptable given the 500ms interval.
| // and if it fails we haven't leaked a subscription | ||
| let pk_hash: [u8; 32] = Sha256::digest(&*our_public_key).into(); | ||
| let nonce: [u8; 32] = rand::random(); | ||
| let report_data = build_report_data(&nonce, Some(&pk_hash)); |
There was a problem hiding this comment.
💡 Consider using OsRng for attestation nonce
For cryptographic nonces in TEE attestation, using OsRng directly (via getrandom or rand::rngs::OsRng) is preferred over rand::random() to avoid any theoretical state exposure from the userspace CSPRNG.
Suggested fix:
Replace `rand::random()` with `use rand::RngCore; let mut nonce = [0u8; 32]; rand::rngs::OsRng.fill_bytes(&mut nonce);`

Summary
POST /admin-api/tee/fleet-join { groupId }that triggers the full TEE HA join flow from the fleet node sideTeeAttestationAnnounceNodeClient::publish_on_group()for publishing on group gossip topicsTest plan
Made with Cursor
Note
Medium Risk
Touches TEE attestation generation and group gossip publishing, so failures or timing/peer-detection behavior could impact fleet join reliability. The endpoint is protected, but still introduces new security-sensitive flow around identity + attestation broadcast.
Overview
Adds a new protected admin endpoint
POST /admin-api/tee/fleet-jointhat takes agroupId, creates a fresh identity, generates a non-mock TDX attestation bound to the identity (via SHA-256 hash), subscribes to the group gossip topic, and broadcasts aTeeAttestationAnnouncemessage (borsh-encoded) to the group; on broadcast failure it unsubscribes.Extends
NodeClientwithpublish_on_group()which publishes togroup/<id>topics after waiting up to 10s for at least one mesh peer (then publishes anyway), and wires the newtee::protected_service()router into the authenticated admin routes. Workspace version is bumped to0.10.1-rc.16, and server dependencies are updated to includeborsh,sha2, andcalimero-network-primitives.Written by Cursor Bugbot for commit fd6231e. This will update automatically on new commits. Configure here.