Skip to content

fix(baidu-cli): implement filterOrganic, add EvaluateJSON + retry, google-cli fixes#9

Open
jay77721 wants to merge 1 commit into
better-world-ai:mainfrom
jay77721:main
Open

fix(baidu-cli): implement filterOrganic, add EvaluateJSON + retry, google-cli fixes#9
jay77721 wants to merge 1 commit into
better-world-ai:mainfrom
jay77721:main

Conversation

@jay77721

Copy link
Copy Markdown

修复了 baidu-cli 的 3 个 bug

Bug 1: filterOrganic() 是空实现

  • 里的 直接 ,所有卡片(AI 回答、相关搜索、百科等)全部原样通过,--all 标志形同虚设
  • 修复:实现了真正的过滤逻辑 — 过滤空标题、空 URL、recommend_list(相关搜索 stub)、ai_agent_distribute(AI 幻觉卡片);保留 www_index(有机结果)、sg_kg_ 系列(百度百科卡片)、se_com_default(兜底结果)

Bug 2: extractorJS 返回原始 JS 数组

  • 原代码直接用 Evaluate() 获取原始 JS 返回值,解析方式不一致
  • 修复:extractorJS 改用 JSON.stringify() 包裹,通过 EvaluateJSON() 解析,与 google-cli 保持一致

Bug 3: 缺少 evaluateWithRetry 重试逻辑

  • 百度版 browser/client.go 没有 EvaluateJSONevaluateWithRetryisTransientContextError 等函数
  • 导航后 JS 上下文可能尚未就绪(CDP 竞态),直接 evaluate 会偶发失败
  • 修复:从 google-cli 同步了完整的重试逻辑(300ms/700ms/1500ms 三档延迟)

修复了 google-cli 的 2 个 bug

Bug 4: Navigate 每次开新标签页

  • newTab: true 导致每次搜索都开一个新 tab,积累大量空标签
  • 修复:改为 newTab: false,复用当前标签页

Bug 5: 空 URL 结果未过滤

  • 广告/扩展条目可能 URL 为空,对 CLI 消费者无意义
  • 修复:提取后过滤掉 URL == 的条目

- Implement filterOrganic: drop empty titles, recommend_list, ai_agent_distribute;
  keep www_index, sg_kg_ (Baike), se_com_default
- Change extractorJS to return JSON.stringify for robust EvaluateJSON parsing
- Add EvaluateJSON, evaluateWithRetry, isTransientContextError to browser client
  (mirrors google-cli pattern — handles CDP context-not-ready race)
- fix(google-cli): Navigate uses newTab:false to avoid tab accumulation
- fix(google-cli): drop search results with empty URLs
@RachelXiaolan

Copy link
Copy Markdown
Contributor

Review: two issues in the baidu-cli changes

Thanks for the fixes! The google-cli part (Bug 4 + 5) looks correct. However, the baidu-cli part has two issues:

Issue 1: EvaluateJSON will double-unwrap in baidu-cli

In baidu-cli/browser/client.go, the existing Evaluate() method already unwraps the {type, value} envelope:

// current Evaluate() in baidu-cli
func (c *Client) Evaluate(code string) (json.RawMessage, error) {
    raw, err := c.Call("evaluate", ...)
    // ... unwraps {type, value} envelope here ...
    return env.Value, nil  // returns the inner value directly
}

But the new EvaluateJSON() calls c.Evaluate() and then tries to unwrap {type, value} again:

// new EvaluateJSON() in this PR
func (c *Client) EvaluateJSON(code string, v any) error {
    raw, err := c.Evaluate(code)  // already unwrapped!
    var env struct {
        Type  string `json:"type"`
        Value string `json:"value"`
    }
    json.Unmarshal(raw, &env)  // this will fail — raw is already the inner value, not the envelope

This works in google-cli because its Evaluate() is a raw pass-through (return c.Call(...)) — it does NOT unwrap. But baidu-cli's Evaluate() does unwrap, so EvaluateJSON() receives the bare value, not the envelope.

Fix options:

  1. Make baidu-cli's Evaluate() a raw pass-through like google-cli (remove the unwrapping), then EvaluateJSON() works as-is. Any callers of Evaluate() that relied on unwrapping would need updating.
  2. Or: change EvaluateJSON() to call c.Call("evaluate", ...) directly instead of c.Evaluate(), bypassing the unwrap.

Issue 2: evaluateWithRetry is added but never called

The new evaluateWithRetry() and isTransientContextError() functions in baidu-cli/browser/client.go are dead code — Search() in baidu/search.go calls client.EvaluateJSON() directly without retry:

// baidu/search.go after this PR's changes
if err := client.EvaluateJSON(extractorJS, &results); err != nil {
    return nil, fmt.Errorf("extract results: %w", err)
}

If retry logic is intended, Search() should call evaluateWithRetry() instead. If not, these functions should be removed.

@xpzouying

Copy link
Copy Markdown
Collaborator

谢谢 @jay77721 的贡献,也特别感谢 @RachelXiaolan 上面那份非常详尽的 review!

作为维护者补充几点:

  1. 我同意 @RachelXiaolan 的分析——baidu-cli 里 EvaluateJSON 的 double-unwrap 和 evaluateWithRetry 是死代码这两个问题确实存在,需要在合入前修一下。她建议的第二种修法(在 EvaluateJSON 里直接调 c.Call(...) 而不是走 c.Evaluate())风险更小一些,因为保留了现有 Evaluate() 的契约,也不用改其它调用点。

  2. google-cli 部分的几个改动已经合到 main——通过 PR feat: 把 google/nanobanana/chatgpt-image 三个 CLI 合并进 monorepo #5(monorepo 合并那次)。你这次为 google-cli 加的 Navigate(newTab: true)EvaluateJSONevaluateWithRetryisTransientContextError,HEAD 上其实已经有了。修完 baidu-cli 后,麻烦 rebase 到最新 main 上,这样最终 diff 就只剩 baidu-cli 的改动 + 可能新加的 google-cli URL 过滤了。

  3. filterOrganic 的实现本身挺好的——丢掉 recommend_list / ai_agent_distribute,保留 www_index / sg_kg_* / se_com_default,正好对上了 stub 里 TODO 注释的指引。这部分做得不错 👍

修好后再 push 上来,我们再 review 一轮,期待合入这个 PR!

xpzouying added a commit that referenced this pull request May 21, 2026
Recognizing @RachelXiaolan for code contributions (PR #17 — Status() health
check + SilenceUsage for baidu-cli/google-cli) and pull-request reviews
(detailed technical review on PR #9, plus follow-up PR #16).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants