Add --url flag to speak command for reading web pages aloud by alexkroman · Pull Request #201 · AssemblyAI/cli

alexkroman · 2026-06-16T23:05:31Z

Adds support for fetching and narrating web pages via a new --url flag on the assembly speak command. The main article text is extracted from the page with boilerplate (nav, footers, comments, sidebars) stripped before being passed to text-to-speech.

Changes

New webpage module (aai_cli/core/webpage.py): Fetches HTML via httpx2 and extracts readable article text using trafilatura. Handles URL validation, network errors, and extraction failures with appropriate error types.
- fetch_article(url) — main entry point; validates http(s) URLs and raises UsageError for non-web URLs or pages with no readable text
- _fetch_html(url) — fetches with browser-like User-Agent and redirect following; maps network/HTTP errors to APIError
- _extract(html) — strips boilerplate and extracts title using trafilatura (imported lazily to keep CLI startup fast)
- Article dataclass — frozen to prevent accidental mutation of fetched content
Updated speak command (aai_cli/commands/speak/__init__.py):
- Added --url option with help text and example
- Updated docstring to document the new web page input mode
- Added to examples section
Updated speak execution (aai_cli/commands/speak/_exec.py):
- Added url field to SpeakOptions
- New _resolve_input() function enforces mutual exclusivity between --url and the text argument/stdin using the mutually_exclusive() helper
- Calls webpage.fetch_article() when --url is provided
Comprehensive test suite (tests/test_webpage.py):
- Tests immutability of Article dataclass
- Tests HTML fetching with browser UA, redirect following, and error handling (404, connection errors)
- Tests boilerplate extraction (nav, footers, comments stripped; title extracted)
- Tests URL validation and readable text validation
- Uses httpx.MockTransport for hermetic testing without real network calls
Integration tests (tests/test_speak.py):
- Tests --url fetches and narrates extracted article text
- Tests mutual exclusivity of --url and text argument
Dependencies: Added trafilatura>=2.1.0 to pyproject.toml

Implementation notes

trafilatura is imported lazily in _extract() to avoid slowing CLI startup
Network timeouts capped at 30 seconds to prevent TTS runs from hanging
Browser-like User-Agent sent to avoid stub/block pages from sites that reject unknown clients
All network/HTTP errors mapped to APIError; extraction failures and invalid URLs mapped to UsageError with helpful suggestions
Mutual exclusivity validation reuses the existing mutually_exclusive() helper from the errors module

https://claude.ai/code/session_01KHf2ttdfNjEwMHvZSMi2HU

Bundle trafilatura as a content adapter for `speak`: --url fetches a web page (httpx, the project's pinned client) and trafilatura strips the boilerplate — nav, sidebars, footers, comment threads — down to the readable article body, so text-to-speech narrates the piece rather than the page chrome. - core/webpage.py: fetch_article(url) -> Article (text/title/url), with a lazy trafilatura import to keep it off CLI startup; non-http URLs and pages with no extractable text raise UsageError, fetch failures APIError. - speak: new --url option, mutually exclusive with the text argument and stdin; resolves to the extracted text before synthesis. trafilatura ships prebuilt wheels (lxml included), so it adds no source-compile step to Homebrew bottling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KHf2ttdfNjEwMHvZSMi2HU

aikido-pr-checks · 2026-06-16T23:05:56Z

+            response.raise_for_status()
+            return response.text
+    except httpx.HTTPError as exc:
+        raise APIError(f"Couldn't fetch {url}: {exc}") from exc


APIError includes the raw user-provided URL and exception in its message (f"Couldn't fetch {url}: {exc}"). Avoid embedding unsanitized URLs in error text; sanitize or redact before including in messages.

Details

✨ AI Reasoning
The exception handler constructs an APIError embedding the requested URL and the HTTP exception (f"Couldn't fetch {url}: {exc}"). If these errors are logged or displayed, the raw URL (and possibly sensitive query strings) will be exposed and may allow log injection via crafted input.

🔧 How do I fix it?
Keep sensitive data such as emails, passwords, and tokens out of logs. When logging values tied to a user, prefer a safe identifier like a user ID over the raw input, and strip line breaks from any user-provided text you do log.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

…des-vv0vr6 # Conflicts: # aai_cli/commands/speak/_exec.py # pyproject.toml # tests/test_speak.py

alexkroman enabled auto-merge June 16, 2026 23:05

aikido-pr-checks Bot reviewed Jun 16, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into claude/dreamy-archime…

9b832a3

…des-vv0vr6 # Conflicts: # aai_cli/commands/speak/_exec.py # pyproject.toml # tests/test_speak.py

alexkroman added this pull request to the merge queue Jun 16, 2026

Merged via the queue into main with commit e53dcbf Jun 16, 2026
19 checks passed

alexkroman deleted the claude/dreamy-archimedes-vv0vr6 branch June 16, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --url flag to speak command for reading web pages aloud#201

Add --url flag to speak command for reading web pages aloud#201
alexkroman merged 2 commits into
mainfrom
claude/dreamy-archimedes-vv0vr6

alexkroman commented Jun 16, 2026

Uh oh!

aikido-pr-checks Bot Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 16, 2026

Changes

Implementation notes

Uh oh!

aikido-pr-checks Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants