feat(knowledge): sk knowledge bulk-tag — batch retag entries by query/filter (#722)#728
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new sk knowledge bulk-tag subcommand that selects knowledge entries by query/filters and applies batch mutations (dry-run by default, --apply to commit), wiring it through sk.py and adding a corresponding integration-style test block in test_fixes.py.
Changes:
- Add
cmd_bulk_tag()implementation and route"bulk-tag"inknowledge-health.py. - Route
sk knowledge bulk-tag ...via_GROUPS["knowledge"]insk.py. - Add I722 tests covering selectors, mutations, dry-run vs apply behavior, and basic routing/source checks.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
knowledge-health.py |
Implements cmd_bulk_tag() and adds main() routing for the bulk-tag subcommand. |
sk.py |
Adds grouped subcommand mapping and dispatch logic for sk knowledge bulk-tag. |
test_fixes.py |
Adds I722 test coverage for the new command and its behaviors. |
| except Exception as _e718_br: | ||
| # === I722: Knowledge Bulk-Tag === | ||
| test("I718-17: briefing.py badge source check", False, str(_e718_br)) |
| db = get_db() | ||
| base = "SELECT DISTINCT ke.id FROM knowledge_entries ke" | ||
| where: list[str] = ["ke.deleted_at IS NULL"] | ||
| params: list = [] |
12f2c05 to
6de35e5
Compare
magicpro97
left a comment
There was a problem hiding this comment.
Code Review — bulk-tag
SQL Injection / Parameterization ✅
All SQL uses ? placeholders throughout. The WHERE clauses are built from hardcoded strings; only values go into params. No f-string SQL, no string interpolation of user input into queries. Clean.
sk.py routing ✅
bulk-tag is correctly registered in _GROUPS["knowledge"] and dispatched with ["bulk-tag"] + sub_rest. Matches the pattern used by pin/unpin/pins.
Test coverage ✅
28 I722 tests cover: selector/mutation validation, dry-run no-op, --apply writes, idempotent --add-tag, combined mutations, --tag join filter, source="bulk-tag" check, sk.py route check, and f-string SQL absence check. Good.
🐛 Bug 1 — LIKE wildcards unescaped in --query (real correctness bug)
File: knowledge-health.py, lines 103–105
where.append("(ke.title LIKE ? OR ke.content LIKE ?)")
like_val = "%" + query_text + "%" # BUG: % and _ in query_text act as wildcards
params.extend([like_val, like_val])SQLite LIKE treats % (any chars) and _ (any single char) as metacharacters. They are not escaped before being embedded in the LIKE pattern. Examples of broken behavior:
--query "%"→like_val = "%%"→ matches every non-deleted entry--query "50%"→ matches entries starting with"50", not those containing the literal"50%"--query "some_thing"→_matches any single char, so "some-thing", "some.thing" etc. all match
Because bulk-tag is a data-modifying command (mutates wing/room/tags in bulk), an unexpectedly broad selector is a correctness defect — not just cosmetic.
Fix: escape LIKE metacharacters before wrapping with %:
# Escape SQLite LIKE metacharacters in the user-supplied substring
_escaped = query_text.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
like_val = "%" + _escaped + "%"
where.append("(ke.title LIKE ? ESCAPE '\\' OR ke.content LIKE ? ESCAPE '\\')")⚠️ Minor — DB connection not closed in exception path
File: knowledge-health.py, exception handler (after the mutation loop)
except Exception as exc:
print(f"⚠ bulk-tag failed: {exc}", file=sys.stderr)
sys.exit(1) # db.close() never calledIf get_db() succeeds but a later operation raises (e.g., db.commit() fails on full disk), db.close() is skipped. In a CLI the OS reclaims the handle on exit, so this is benign in practice, but for WAL-mode databases a reader lock can linger until GC. Recommend adding db.close() before sys.exit(1).
Both issues are in cmd_bulk_tag only. Everything else — SQL parameterization, routing, test coverage — looks correct.
|
Fixed all reviewer issues:
Requesting re-review. CI pending. |
…/filter (#722) - knowledge-health.py: cmd_bulk_tag with --query/--wing/--room/--tag selectors, --add-tag/--set-wing/--set-room mutations, dry-run by default, --apply to commit - sk.py: wire sk knowledge bulk-tag routing - test_fixes.py: +28 tests (I722) Closes #722 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…close() on error - Escape LIKE metacharacters (\, %, _) in --query argument with ESCAPE '\' clause to prevent wildcard injection from user input - Guard deleted_at IS NULL clause with PRAGMA table_info check so the command works on older DB schemas that lack the deleted_at column - Add db.close() in the except handler to prevent connection leaks on error - Build WHERE clause only when conditions are non-empty (handles edge case of no columns/conditions with schema missing deleted_at) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
094b031 to
1e6242f
Compare
Summary
Implements
sk knowledge bulk-tagcommand for batch retagging knowledge entries.Closes #722
Changes
knowledge-health.py: Addedcmd_bulk_tag()with selector/mutation pipelinesk.py: Addedbulk-tagrouting to_GROUPS["knowledge"]test_fixes.py: 28 I722 tests (now correctly placed after sys.exit fix)Fixes
Review