Skip to content

fix: migrate rquest to wreq + engineering review hardening (17 fixes)#6

Open
Zireael wants to merge 7 commits intopaperfoot:masterfrom
Zireael:fix/rquest-to-wreq-migration
Open

fix: migrate rquest to wreq + engineering review hardening (17 fixes)#6
Zireael wants to merge 7 commits intopaperfoot:masterfrom
Zireael:fix/rquest-to-wreq-migration

Conversation

@Zireael
Copy link
Copy Markdown

@Zireael Zireael commented Apr 23, 2026

Summary

Migrates from rquest to stable wreq (v5) and applies 17 engineering-review fixes covering reliability bugs, DRY refactoring, dead-code cleanup, cache eviction, and test coverage.

Migration

  • rquest → wreq: Switches HTTP client from rquest (unstable, BoringSSL conflicts) to wreq (stable v5, OpenSSL-compatible). Resolves the BoringSSL vs OpenSSL linking conflict that blocked builds on some platforms.

Bug Fixes (6)

# Fix Files
.4 browserless::extract_text_simple now skips <script>/<style> content providers/browserless.rs
.5 Extract/Scrape chain uses shared deadline instead of per-call timeout, preventing budget overflow engine.rs
.6 stealth provider maps HTTP errors as SearchError::Api instead of SearchError::Config providers/stealth.rs
.8 finalize_response() wired into execute_search return path (result_count was 0 until run() fixed it) engine.rs
.13 retry_request .when() now also matches SearchError::Wreq errors providers/mod.rs
.14 Cross-platform home_dir() — resolves $HOME on Unix, %USERPROFILE% on Windows config.rs, cache.rs, logging.rs, cli.rs
.16 Cache disk write failures now log tracing::warn! instead of silent let _ = cache.rs

DRY Refactoring (5)

# Extraction From → To Copies Removed
.9 augment_query() brave.rs, serper.rs, you.rs → providers/mod.rs 3
.10 map_freshness() brave.rs, you.rs → types.rs 2
.17 extract_title() stealth.rs, browserless.rs → providers/mod.rs 2
.11 epoch_days_to_date() logging.rs, xai.rs → utils.rs (new) 2
.2 try_provider() / try_provider_remaining() inlined boilerplate → helper functions in engine.rs ~160 lines

Cleanup (2)

  • .12: Removed unused Provider::timeout() trait method + 12 provider impl blocks + 11 unused Duration imports
  • .19: Removed build_providers() call from execute_special — now only constructs needed providers per mode

Enhancements (1)

  • .15: Cache file eviction on startup — scans q_*.json files, removes expired (≥5 min old) and unparseable entries

Test Coverage

59 unit tests + 36 integration tests = 95 total (was 46+36=82)

Area Tests Added What's Covered
classify.rs +13 social, news, academic, scholar, patents, people, extract, similar, images, places, general, priority, SE-focused (12 sub-tests)
engine.rs +5 normalize_url edge cases (trailing slash, query params, fragments), provider_allowed with/without filter
browserless.rs +4 extract_text_simple script/style skipping, visible text preservation
you.rs +5 YouResponse deserialization (hits-only, news-only, empty, optional fields)
cache.rs +8 should_cache_query_response (success, all_providers_failed, degraded-empty, empty-no-failures), path determinism, mode sensitivity, case insensitivity, q_ prefix

Verification

$ cargo test      → 95 passed, 0 failed
$ cargo clippy --release → 0 warnings

Zireael added 7 commits April 19, 2026 07:34
…ability

Config, cache, timeout, and rejection-diagnostics hardening:

- config: type-numeric writes for settings.timeout/count (hbq1), legacy
  quoted-numeric coercion (hbq2)
- cache: skip caching all-provider-failed and degraded-empty responses (hbq3)
- engine: unified timeout budget from settings.timeout (hbq5), remove
  special-mode literals (hbq6), provider count clamping for Brave cap (hbq7)
- types/errors: structured providers_failed_detail taxonomy with cause/action/
  signature fields, backward-compatible (hbq4, hbq13, hbq14)
- providers: spawn_blocking extraction offload in stealth/browserless (hbq9),
  Exa NUM_RESULTS_EXCEEDED and Jina Cloudflare-1010/Browserless auth-mode
  rejection classification (hbq13)
- main/logging: env-driven tracing subscriber with quiet default, structured
  reliability events (hbq8)
- README: troubleshooting rejection diagnostics section (hbq15)
- clippy cleanup: unused vars, range pattern, test module ordering
- build: fix backon v1 retry callback (use .notify() on retry future)
The rquest HTTP client crate has been renamed to wreq, and the old
packages will be yanked. This commit migrates all references:

- Cargo.toml: rquest -> wreq v5, rquest-util -> wreq-util v2
- src/errors.rs: SearchError::Rquest -> SearchError::Wreq
- src/providers/stealth.rs: imports and types updated
- src/engine.rs: error variant match updated
- .github/workflows/release.yml: comment updated

Uses wreq v5.3.0 + wreq-util v2.2.6 (both stable), which provide
the same v5 API as rquest — purely a crate rename, no behavior change.

Closes paperfoot#4
Cherry-picked from andrey-golovko/search-cli fix/linux-build branch.
- Remove readability crate (pulled reqwest with native-tls/OpenSSL)
- Replace readability extraction with tl-based title + tag-stripping fallback
- Keep spawn_blocking offload for extraction from reliability hardening PR
- self_update: default-features = false to avoid native-tls
Cherry-picked from mouse-value-add/search-cli feat/you-search-provider.
- New You.com provider with general search and news search
- Freshness mapping, domain include/exclude filters
- Auth, API status, and rate-limit error handling
- Wired into engine routing, config, CLI, and docs
Bug fixes:
- browserless extract_text_simple now skips <script>/<style> content
- Extract/Scrape chain uses shared deadline to prevent timeout overflow
- stealth provider maps HTTP errors as SearchError::Api (not Config)
- finalize_response() wired into execute_search return path
- retry_request .when() now also matches SearchError::Wreq errors
- Cross-platform home_dir() resolves /home/zir on Unix, %USERPROFILE% on Windows
- Cache write failures now log warnings instead of silent ignore

DRY refactoring:
- Shared augment_query() extracted to providers/mod.rs (3 copies removed)
- Shared map_freshness() extracted to types.rs (2 copies removed)
- Shared extract_title() extracted to providers/mod.rs (2 copies removed)
- Shared epoch_days_to_date() extracted to utils.rs (2 copies removed)
- execute_special refactored with try_provider/try_provider_remaining helpers
  (~160 lines of boilerplate eliminated)

Cleanup:
- Removed unused Provider::timeout() trait method + 12 provider impls
- Removed build_providers() call from execute_special (avoids unused instances)

Enhancements:
- Cache file eviction on startup removes expired q_*.json files

Test coverage:
- 13 classify tests (social/news/academic/scholar/patents/people/extract/
  similar/images/places/general/priority + 12 SE-focused)
- 3 engine tests (normalize_url, provider_allowed)
- 4 browserless extract_text_simple tests (script/style skip)
- 5 you.com provider tests (JSON deserialization)
- 8 cache logic tests (should_cache_query_response, path determinism)
- 5 additional normalize_url edge cases

95 tests pass, 0 clippy warnings.
@Zireael Zireael changed the title fix: migrate rquest to wreq (stable v5) fix: migrate rquest to wreq + engineering review hardening (17 fixes) Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant