Add fsspec-backed remote storage support for bucket URLs#134
Merged
Conversation
A new aai_cli/remotefs.py (mirroring youtube.py's lazy-import shape) fetches audio from any fsspec-addressable store. fsspec core ships with the CLI; protocol backends (s3fs, gcsfs, adlfs, …) stay user-installed extras surfaced via fsspec's own install hint as a clean CLIError. - Single files: a bucket URL is downloaded to a temp dir and uploaded, like the YouTube path. resolve_audio_source gains an opt-in allow_remote flag, so stream/agent keep rejecting bucket URLs as missing files. - Batch mode: a remote glob (s3://bucket/calls/*.mp3) or trailing-slash folder (filtered by AUDIO_EXTENSIONS, recursive) expands like its local equivalent; sidecar naming now treats any bucket URL like a web URL (slug + hash in cwd). - --show-code rejects bucket URLs up front instead of emitting a script the SDK can't run. Tests drive the real fsspec code paths via its in-process memory:// filesystem (shared memory_fs fixture), so pytest-socket stays armed. https://claude.ai/code/session_01KKPpFcWPk3tyqVRBApdgcj
…ay-la380g # Conflicts: # pyproject.toml
…ay-la380g # Conflicts: # .importlinter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for transcribing audio from fsspec-addressable remote storage (S3, Google Cloud Storage, Azure Blob Storage, SFTP, etc.). Users can now pass bucket URLs like
s3://bucket/calls/*.mp3orgs://bucket/calls/toassembly transcribefor both single-file and batch operations.Key Changes
New
remotefsmodule (aai_cli/remotefs.py): Provides a clean abstraction over fsspec for downloading remote files and expanding globs/folder listings. Normalizes backend exceptions into user-friendlyCLIErrormessages with install hints when optional protocol backends (s3fs, gcsfs, etc.) are missing.Batch mode expansion for remote URLs (
aai_cli/transcribe_batch.py):expand_sources()now recognizes bucket URLs and delegates to_remote_sources()s3://bucket/*.mp3) and folder scans with trailing slash (s3://bucket/calls/)Single-file remote transcription (
aai_cli/transcribe_exec.py):run_transcription()downloads remote files to a temporary directory before uploading to the APIcheck_source_exists()validates remote URLs exist before transcription startsallow_remote=Truetoresolve_audio_source()to permit bucket URLsSource validation updates (
aai_cli/client.py):resolve_audio_source()gainsallow_remoteparameter to opt-in to bucket URL supportHelp text and examples (
aai_cli/commands/transcribe.py):assembly transcribe "s3://bucket/calls/*.mp3"Comprehensive test coverage (
tests/test_remotefs.py,tests/test_transcribe_batch_sources.py,tests/test_transcribe.py,tests/test_source_validation.py,tests/test_transcribe_show_code.py):memory://filesystem to exercise real glob/find/download code paths offline--show-coderejects bucket URLs (generated SDK code can't fetch them)Dependencies (
pyproject.toml):fsspec>=2026.4.0as a core dependency (protocol backends like s3fs remain optional user installs)Implementation Details
aai-remote-prefix to avoid polluting the working directoryCLIErrormessages; missing backends surface fsspec's own install hintsmemory_fspytest fixture resets fsspec's global state between tests to prevent cross-test pollutionhttps://claude.ai/code/session_01KKPpFcWPk3tyqVRBApdgcj