Add whole-internet Web Search platform via server-side GraphQL#34
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: Payel-git-ol#33
Integrate a Google web search (modelled on web-agent-master/google-search, scraped with Playwright + DuckDuckGo HTML fallback) as additional support on top of the curated news sources, exposed through the backend. - src/services/web-search.ts: googleSearch() API, pluggable WebSearchEngine (Playwright Google / DuckDuckGo HTTP / resilient), WebSearchProvider adapting hits to SearchResult, offline/CI-safe (empty list on failure). - src/services/search.ts: CompositeSearchProvider merges curated + web hits, de-duplicates by URL, sorts by score. - src/cli/sources.ts: 'web-search' ResearchPlatformId (not a SourceGroup), selectablePlatformIds/getPlatformLabel/getPlatformDescription/isWebSearchEnabled, DEFAULT_ENABLED_PLATFORMS. PlatformSelector + App wire the new toggle. - pipeline/collector + app/tradefast run web search when the platform is enabled. - backend: SearchResultDto + webSearch GraphQL query (resolver/repository/facade), cli graphql repository + WEB_SEARCH_QUERY; search runs server-side. - Declare design:paramtypes for the @Args() params since esbuild can't emit it. - Tests for web-search service, composite provider, selector platforms, and the backend/graphql webSearch path. README + version bump to 0.7.0. Closes Payel-git-ol#33
Working session summaryCI passed on the latest commit ( SummaryIssue #33 is fully resolved. PR #34 is updated, marked ready, and CI is green (typecheck + 146 tests + build all pass on the pushed commit). All three requirements from the issue are delivered:
Key fix this session: the 2 failing tests were caused by NestJS Also: README documented across highlights/commands/env/architecture sections, and version bumped 0.6.0 → 0.7.0. PR: #34 This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $10.447576📊 Context and tokens usage:Claude Opus 4.8: (3 sub-sessions)
Total: (20.5K new + 302.3K cache writes + 12.2M cache reads) input tokens, 91.9K output tokens, $10.408869 cost Claude Haiku 4.5:
Total: 29.7K input tokens, 1.8K output tokens, $0.038707 cost 🤖 Models used:
📎 Log file uploaded as Gist (5209KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This reverts commit b096db5.
Summary
Integrates a whole-internet Web Search into Tradefast's research, as requested in #33:
KnowledgeBaseSearch/news crawl via a newCompositeSearchProvider, so a run pulls in results from the entire Internet, not just the built-in feeds.web-agent-master/google-search. A newsrc/services/web-search.tsmirrors that library's API —googleSearch(query, options)returning{ query, results }of{ title, link, snippet }— behind a pluggableWebSearchEngine. The default engine scrapes Google with Playwright (as the library does) and falls back to a JavaScript-free DuckDuckGo HTML engine when Chromium isn't installed, exactly like the news crawler's resilient fetcher. It returns an empty list — never throws — on network failure, so collection stays resilient and CI stays offline-safe./serching-platforms. A newweb-searchresearch platform appears as its own toggle in the multi-select pop-up (PlatformSelector), enabled by default. It is deliberately not a newsSourceGroup, so the strict curated-source contract is untouched.webSearch(query: String!, limit: Int): [SearchResult!]!GraphQL query is exposed through the samecli → graphql → backendrepository path established in Transferring API requests to the backend #32 (resolver → repository → facade), with a newSearchResultDto. The CLI calls it throughGraphqlTradefastRepository.How it works
Notable implementation detail
The build pipeline (esbuild via tsup/vitest) cannot emit
design:paramtypes, which NestJS@Args()reflects at schema-build time — every other field already avoids this by passing explicit@Field(() => …)thunks. ThewebSearchquery's argument metadata is therefore declared by hand (Reflect.defineMetadata, exactly whatemitDecoratorMetadatawould generate) so the schema builds reflection-free while keeping the explicit() => String/() => Intarg types.Testing
tests/web-search.test.ts—googleSearch,WebSearchProvider(decaying scores, empty-list-on-failure),CompositeSearchProvider(layering, de-dup, survives a throwing provider), andparseDuckDuckGoHtml.tests/sources.test.ts— Web Search is selectable, labelled/described, default-enabled, and kept out of the news-group contract.tests/backend.test.ts+tests/graphql-repository.test.ts— thewebSearchquery resolves and serves over HTTP through the full cli → graphql → backend path.Version bumped 0.6.0 → 0.7.0.
Fixes #33