Skip to content

Add whole-internet Web Search platform via server-side GraphQL#34

Merged
Payel-git-ol merged 3 commits into
Payel-git-ol:masterfrom
konard:issue-33-feb048cfde11
May 31, 2026
Merged

Add whole-internet Web Search platform via server-side GraphQL#34
Payel-git-ol merged 3 commits into
Payel-git-ol:masterfrom
konard:issue-33-feb048cfde11

Conversation

@konard

@konard konard commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Integrates a whole-internet Web Search into Tradefast's research, as requested in #33:

  • Additional support on top of the curated sources. Web search is layered onto the existing KnowledgeBaseSearch/news crawl via a new CompositeSearchProvider, so a run pulls in results from the entire Internet, not just the built-in feeds.
  • Modelled on web-agent-master/google-search. A new src/services/web-search.ts mirrors that library's API — googleSearch(query, options) returning { query, results } of { title, link, snippet } — behind a pluggable WebSearchEngine. The default engine scrapes Google with Playwright (as the library does) and falls back to a JavaScript-free DuckDuckGo HTML engine when Chromium isn't installed, exactly like the news crawler's resilient fetcher. It returns an empty list — never throws — on network failure, so collection stays resilient and CI stays offline-safe.
  • "Web Search" added to /serching-platforms. A new web-search research platform appears as its own toggle in the multi-select pop-up (PlatformSelector), enabled by default. It is deliberately not a news SourceGroup, so the strict curated-source contract is untouched.
  • Search runs on the backend. A webSearch(query: String!, limit: Int): [SearchResult!]! GraphQL query is exposed through the same cli → graphql → backend repository path established in Transferring API requests to the backend #32 (resolver → repository → facade), with a new SearchResultDto. The CLI calls it through GraphqlTradefastRepository.

How it works

cli (/serching-platforms toggles web-search)
  → app collects with CompositeSearchProvider(KnowledgeBaseSearch, WebSearchProvider)
  → backend GraphQL `webSearch` query  → repository → facade.search()
      → detectWebSearchEngine(): Playwright-Google  ──fallback──▶  DuckDuckGo HTML

Notable implementation detail

The build pipeline (esbuild via tsup/vitest) cannot emit design:paramtypes, which NestJS @Args() reflects at schema-build time — every other field already avoids this by passing explicit @Field(() => …) thunks. The webSearch query's argument metadata is therefore declared by hand (Reflect.defineMetadata, exactly what emitDecoratorMetadata would generate) so the schema builds reflection-free while keeping the explicit () => String / () => Int arg types.

Testing

  • tests/web-search.test.tsgoogleSearch, WebSearchProvider (decaying scores, empty-list-on-failure), CompositeSearchProvider (layering, de-dup, survives a throwing provider), and parseDuckDuckGoHtml.
  • tests/sources.test.ts — Web Search is selectable, labelled/described, default-enabled, and kept out of the news-group contract.
  • tests/backend.test.ts + tests/graphql-repository.test.ts — the webSearch query resolves and serves over HTTP through the full cli → graphql → backend path.
npm run typecheck   # clean
npm test            # 146 passed (16 files)
npm run build       # ESM build success

Version bumped 0.6.0 → 0.7.0.

Fixes #33

konard added 2 commits May 31, 2026 08:06
Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: Payel-git-ol#33
Integrate a Google web search (modelled on web-agent-master/google-search,
scraped with Playwright + DuckDuckGo HTML fallback) as additional support on
top of the curated news sources, exposed through the backend.

- src/services/web-search.ts: googleSearch() API, pluggable WebSearchEngine
  (Playwright Google / DuckDuckGo HTTP / resilient), WebSearchProvider adapting
  hits to SearchResult, offline/CI-safe (empty list on failure).
- src/services/search.ts: CompositeSearchProvider merges curated + web hits,
  de-duplicates by URL, sorts by score.
- src/cli/sources.ts: 'web-search' ResearchPlatformId (not a SourceGroup),
  selectablePlatformIds/getPlatformLabel/getPlatformDescription/isWebSearchEnabled,
  DEFAULT_ENABLED_PLATFORMS. PlatformSelector + App wire the new toggle.
- pipeline/collector + app/tradefast run web search when the platform is enabled.
- backend: SearchResultDto + webSearch GraphQL query (resolver/repository/facade),
  cli graphql repository + WEB_SEARCH_QUERY; search runs server-side.
- Declare design:paramtypes for the @Args() params since esbuild can't emit it.
- Tests for web-search service, composite provider, selector platforms, and the
  backend/graphql webSearch path. README + version bump to 0.7.0.

Closes Payel-git-ol#33
@konard konard changed the title [WIP] Update search Add whole-internet Web Search platform via server-side GraphQL May 31, 2026
@konard konard marked this pull request as ready for review May 31, 2026 08:28
@konard

konard commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

CI passed on the latest commit (a36fb1f). Everything is complete.

Summary

Issue #33 is fully resolved. PR #34 is updated, marked ready, and CI is green (typecheck + 146 tests + build all pass on the pushed commit).

All three requirements from the issue are delivered:

  1. web-agent-master/google-search integrated as additional supportsrc/services/web-search.ts mirrors that library's googleSearch(query, options) → { query, results } API behind a pluggable WebSearchEngine (Playwright-Google scraper with a DuckDuckGo HTML fallback when Chromium is absent, empty-list-on-failure so runs never break). CompositeSearchProvider layers these whole-internet hits on top of the curated KnowledgeBaseSearch/news sources, de-duplicating by URL.

  2. "Web Search" added to /serching-platforms — a new web-search ResearchPlatformId (deliberately not a news SourceGroup, preserving the strict curated-source contract) appears as its own toggle in the PlatformSelector pop-up, enabled by default.

  3. Search runs on the backend — a webSearch(query: String!, limit: Int): [SearchResult!]! GraphQL query is exposed through the same cli → graphql → backend repository path from PR Transferring API requests to the backend #32 (resolver → repository → facade), with a new SearchResultDto.

Key fix this session: the 2 failing tests were caused by NestJS @Args() reflecting design:paramtypes at schema-build time — metadata esbuild can't emit. I declared it explicitly via Reflect.defineMetadata(..., [String, Number], TradefastResolver.prototype, 'webSearch') (exactly what emitDecoratorMetadata would generate), keeping the schema build reflection-free while preserving the explicit arg type thunks. All 146 tests now pass.

Also: README documented across highlights/commands/env/architecture sections, and version bumped 0.6.0 → 0.7.0.

PR: #34


This summary was automatically extracted from the AI working session output.

@konard

konard commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $10.447576

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

  1. 107.9K / 1M (11%) input tokens, 24.0K / 128K (19%) output tokens
  2. 116.4K / 1M (12%) input tokens, 39.6K / 128K (31%) output tokens
  3. 73.0K / 1M (7%) input tokens, 12.2K / 128K (10%) output tokens

Total: (20.5K new + 302.3K cache writes + 12.2M cache reads) input tokens, 91.9K output tokens, $10.408869 cost

Claude Haiku 4.5:

  • 29.7K / 200K (15%) input tokens, 1.8K / 64K (3%) output tokens

Total: 29.7K input tokens, 1.8K output tokens, $0.038707 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Main model: Claude Opus 4.8 (claude-opus-4-8)
  • Additional models:
    • Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (5209KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@Payel-git-ol Payel-git-ol merged commit 0286e24 into Payel-git-ol:master May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update search

2 participants