Skip to content

feat(knowledge): sk knowledge bulk-tag — batch retag entries by query/filter (#722)#728

Merged
magicpro97 merged 6 commits into
mainfrom
feat/i722-bulk-tag-rebased
May 30, 2026
Merged

feat(knowledge): sk knowledge bulk-tag — batch retag entries by query/filter (#722)#728
magicpro97 merged 6 commits into
mainfrom
feat/i722-bulk-tag-rebased

Conversation

@magicpro97
Copy link
Copy Markdown
Owner

Summary

Implements sk knowledge bulk-tag command for batch retagging knowledge entries.

Closes #722

Changes

  • knowledge-health.py: Added cmd_bulk_tag() with selector/mutation pipeline
  • sk.py: Added bulk-tag routing to _GROUPS["knowledge"]
  • test_fixes.py: 28 I722 tests (now correctly placed after sys.exit fix)

Fixes

  • Ruff CI errors fixed (rebased on main b0352a5)
  • I722 tests were dead code (after sys.exit) — fixed

Review

  • Dead code bug found and fixed: 28 I722 tests now actually execute and pass

Copilot AI review requested due to automatic review settings May 30, 2026 08:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new sk knowledge bulk-tag subcommand that selects knowledge entries by query/filters and applies batch mutations (dry-run by default, --apply to commit), wiring it through sk.py and adding a corresponding integration-style test block in test_fixes.py.

Changes:

  • Add cmd_bulk_tag() implementation and route "bulk-tag" in knowledge-health.py.
  • Route sk knowledge bulk-tag ... via _GROUPS["knowledge"] in sk.py.
  • Add I722 tests covering selectors, mutations, dry-run vs apply behavior, and basic routing/source checks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
knowledge-health.py Implements cmd_bulk_tag() and adds main() routing for the bulk-tag subcommand.
sk.py Adds grouped subcommand mapping and dispatch logic for sk knowledge bulk-tag.
test_fixes.py Adds I722 test coverage for the new command and its behaviors.

Comment thread test_fixes.py
Comment on lines 9375 to 9377
except Exception as _e718_br:
# === I722: Knowledge Bulk-Tag ===
test("I718-17: briefing.py badge source check", False, str(_e718_br))
Comment thread knowledge-health.py
Comment on lines +2767 to +2770
db = get_db()
base = "SELECT DISTINCT ke.id FROM knowledge_entries ke"
where: list[str] = ["ke.deleted_at IS NULL"]
params: list = []
@magicpro97 magicpro97 force-pushed the feat/i722-bulk-tag-rebased branch 3 times, most recently from 12f2c05 to 6de35e5 Compare May 30, 2026 11:39
Copy link
Copy Markdown
Owner Author

@magicpro97 magicpro97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — bulk-tag

SQL Injection / Parameterization ✅

All SQL uses ? placeholders throughout. The WHERE clauses are built from hardcoded strings; only values go into params. No f-string SQL, no string interpolation of user input into queries. Clean.

sk.py routing ✅

bulk-tag is correctly registered in _GROUPS["knowledge"] and dispatched with ["bulk-tag"] + sub_rest. Matches the pattern used by pin/unpin/pins.

Test coverage ✅

28 I722 tests cover: selector/mutation validation, dry-run no-op, --apply writes, idempotent --add-tag, combined mutations, --tag join filter, source="bulk-tag" check, sk.py route check, and f-string SQL absence check. Good.


🐛 Bug 1 — LIKE wildcards unescaped in --query (real correctness bug)

File: knowledge-health.py, lines 103–105

where.append("(ke.title LIKE ? OR ke.content LIKE ?)")
like_val = "%" + query_text + "%"  # BUG: % and _ in query_text act as wildcards
params.extend([like_val, like_val])

SQLite LIKE treats % (any chars) and _ (any single char) as metacharacters. They are not escaped before being embedded in the LIKE pattern. Examples of broken behavior:

  • --query "%"like_val = "%%" → matches every non-deleted entry
  • --query "50%" → matches entries starting with "50", not those containing the literal "50%"
  • --query "some_thing"_ matches any single char, so "some-thing", "some.thing" etc. all match

Because bulk-tag is a data-modifying command (mutates wing/room/tags in bulk), an unexpectedly broad selector is a correctness defect — not just cosmetic.

Fix: escape LIKE metacharacters before wrapping with %:

# Escape SQLite LIKE metacharacters in the user-supplied substring
_escaped = query_text.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
like_val = "%" + _escaped + "%"
where.append("(ke.title LIKE ? ESCAPE '\\' OR ke.content LIKE ? ESCAPE '\\')")

⚠️ Minor — DB connection not closed in exception path

File: knowledge-health.py, exception handler (after the mutation loop)

except Exception as exc:
    print(f"⚠ bulk-tag failed: {exc}", file=sys.stderr)
    sys.exit(1)   # db.close() never called

If get_db() succeeds but a later operation raises (e.g., db.commit() fails on full disk), db.close() is skipped. In a CLI the OS reclaims the handle on exit, so this is benign in practice, but for WAL-mode databases a reader lock can linger until GC. Recommend adding db.close() before sys.exit(1).


Both issues are in cmd_bulk_tag only. Everything else — SQL parameterization, routing, test coverage — looks correct.

@magicpro97
Copy link
Copy Markdown
Owner Author

Fixed all reviewer issues:

  • LIKE metachar escaping: query_text now escapes \, %, _ with ESCAPE '\\' clause to prevent wildcard injection
  • deleted_at column guard: Now uses PRAGMA table_info to check if column exists before adding it to WHERE clause — works on older DB schemas
  • Connection leak: Added db.close() in the except handler with its own try/except to ensure cleanup on error

Requesting re-review. CI pending.

Linh Ngo and others added 6 commits May 30, 2026 19:08
…/filter (#722)

- knowledge-health.py: cmd_bulk_tag with --query/--wing/--room/--tag selectors, --add-tag/--set-wing/--set-room mutations, dry-run by default, --apply to commit
- sk.py: wire sk knowledge bulk-tag routing
- test_fixes.py: +28 tests (I722)

Closes #722

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…close() on error

- Escape LIKE metacharacters (\, %, _) in --query argument with ESCAPE '\' clause
  to prevent wildcard injection from user input
- Guard deleted_at IS NULL clause with PRAGMA table_info check so the command
  works on older DB schemas that lack the deleted_at column
- Add db.close() in the except handler to prevent connection leaks on error
- Build WHERE clause only when conditions are non-empty (handles edge case of
  no columns/conditions with schema missing deleted_at)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@magicpro97 magicpro97 force-pushed the feat/i722-bulk-tag-rebased branch from 094b031 to 1e6242f Compare May 30, 2026 12:10
@magicpro97 magicpro97 merged commit 23e4a24 into main May 30, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(knowledge): sk knowledge bulk-tag — batch retag entries by query/filter

2 participants