Skip to content

fix: S3 blobstore multi-tenant, GCS ACL, blob CID, video embeds#166

Open
rabble wants to merge 1 commit intoblacksky-algorithms:mainfrom
divinevideo:pr/blobstore-fixes
Open

fix: S3 blobstore multi-tenant, GCS ACL, blob CID, video embeds#166
rabble wants to merge 1 commit intoblacksky-algorithms:mainfrom
divinevideo:pr/blobstore-fixes

Conversation

@rabble
Copy link
Copy Markdown

@rabble rabble commented Mar 29, 2026

Summary

  • S3 multi-tenant bucket: Separate bucket name (PDS_BLOBSTORE_S3_BUCKET env var) from the DID-based path prefix. Falls back to using the DID as the bucket name for backwards compatibility with single-tenant setups.
  • GCS ACL compatibility: Detect storage.googleapis.com in AWS_ENDPOINT and skip per-object ACLs, since Google Cloud Storage uses uniform bucket-level access control.
  • S3 copy_source fix: Use bucket/key format directly instead of requiring a separate AWS_ENDPOINT_BUCKET env var.
  • Blob CID raw codec: Use the correct raw codec 0x55 for blob CIDs. The previous value 0x77 is an obsolete/incorrect codec that causes CID mismatches with the ATProto reference implementation.
  • Firehose commit ops: Always build and emit commit ops for all writes, not just those with swap_cid set. This fixes missing firehose events when using putRecord without swap.
  • Video embed blobs: Walk Ipld::List and Ipld::Map variants in find_blob_refs so video embed blobs are discovered when records come from JSON deserialization. Add video/mp4 and text/vtt blob constraints for app.bsky.embed.video.
  • PLC client: Stop double-encoding DID URLs (DIDs don't need URI component encoding for PLC directory requests).

Test plan

  • Verify multi-tenant S3 setup works with PDS_BLOBSTORE_S3_BUCKET set to a shared bucket name
  • Verify single-tenant setups still work without PDS_BLOBSTORE_S3_BUCKET (falls back to DID)
  • Verify blob uploads work with GCS endpoint (no ACL errors)
  • Verify blob CIDs match reference implementation (codec 0x55)
  • Verify video posts create proper blob references
  • Run cargo test -p rsky-pds for new unit tests on blob ref discovery and constraints
  • Run cargo test -p rsky-common for CID codec tests

🤖 Generated with Claude Code

…, and video embed support

- Separate S3 bucket name (PDS_BLOBSTORE_S3_BUCKET) from DID path prefix,
  fixing multi-tenant deployments where all actors share one bucket
- Skip object ACLs when endpoint is storage.googleapis.com (GCS uses
  uniform bucket-level access, not per-object ACLs)
- Fix copy_source format to use bucket/key instead of requiring
  AWS_ENDPOINT_BUCKET env var
- Use correct raw codec 0x55 for blob CIDs (was incorrectly using 0x77)
- Always emit firehose commit ops for all writes, not just those with
  swap_cid (fixes missing events on putRecord)
- Add video/mp4 and text/vtt blob constraints for app.bsky.embed.video
- Walk Ipld::List and Ipld::Map variants in find_blob_refs so video
  embed blobs are discovered when records are deserialized from JSON
- Stop double-encoding DID URLs in PLC client requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant