Skip to content

Feat/packet inspector query tools#30

Open
dipeshwalia wants to merge 5 commits into
good-tools:masterfrom
dipeshwalia:feat/packet-inspector-llm-query-tools
Open

Feat/packet inspector query tools#30
dipeshwalia wants to merge 5 commits into
good-tools:masterfrom
dipeshwalia:feat/packet-inspector-llm-query-tools

Conversation

@dipeshwalia
Copy link
Copy Markdown
Contributor

@dipeshwalia dipeshwalia commented May 12, 2026

Summary by CodeRabbit

  • New Features

    • Extract specific header-field values with optional filtering and row limits.
    • List present fields across frames with counts and metadata.
    • Retrieve a recursive protocol hierarchy with frame/byte aggregates.
  • Improvements

    • Stronger return-code checks and stricter argument handling in APIs.
    • JS/TS wrappers now normalize responses into native arrays and typed shapes.
  • Tests

    • Added tests covering extraction, presence listing, protocol hierarchy, and preference behavior.

Review Change Stack

Avoid duplicate decode-as default reapplication across module traversal and keep PREF_DECODE_AS_RANGE updates merged with protocol defaults to prevent runaway memory growth on Node 20.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@dipeshwalia has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 26 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9641629f-2f67-4fed-9b53-ff454e95d3a9

📥 Commits

Reviewing files that changed from the base of the PR and between 4e718c5 and 97e1c91.

📒 Files selected for processing (1)
  • lib/Makefile

Walkthrough

This PR adds three new data-extraction APIs to Wiregasm: extractFields to retrieve specific field values from dissected packets per frame, listPresentFields to enumerate available fields with occurrence counts, and protocolHierarchy to aggregate protocol-level statistics. The implementation spans C++, Emscripten bindings, and TypeScript, with fixes to preference-range merging and comprehensive test coverage.

Changes

Extract Fields, List Present Fields & Protocol Hierarchy

Layer / File(s) Summary
Data structures & type contracts
src/types.ts, lib/wiregasm/wiregasm.h
FieldValue, ExtractedRow, PresentField, ProtocolNode, and response wrapper types are defined across TypeScript and C++. DissectSession gains three new method signatures, and WiregasmLib adds VectorString constructor for building string vectors.
C++ core processing logic
lib/wiregasm/lib.cpp, lib/wiregasm/lib.h
wg_session_process_extract_fields, wg_session_process_present_fields, and wg_session_process_protocol_hierarchy walk proto_tree structures, accumulate results with optional filtering and truncation, reset dissector state per frame, and return structured responses. Recursive helpers extract field values, enumerate present fields by abbreviation, and aggregate protocol nodes.
Bindings, session wrappers & preference fixes
lib/wiregasm/bindings.cpp, lib/wiregasm/wiregasm.cpp
Emscripten bindings register value_object types and typed vectors for JavaScript serialization. DissectSession wrapper methods forward calls to C++ processing functions. Decode-as preference handling is fixed to merge new values against the default range instead of the current range, and module callback logic is simplified to skip submodule traversal.
JavaScript wrapper API
src/index.ts
Wiregasm class gains extract_fields, list_present_fields, and protocol_hierarchy methods that normalize C++ responses into JavaScript arrays and objects. Type imports are expanded, preference code comparisons are strengthened to strict checks, and tap/iograph argument handling is improved.
Test coverage
src/index.test.ts
New tests validate extract_fields per-frame output, invalid-filter error handling, and truncation behavior. list_present_fields and protocol_hierarchy results are verified, including recursive protocol node presence. Preference tests are extended to confirm re-setting and default value preservation.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title refers to 'packet inspector query tools', which aligns with the PR's addition of three new query/extraction APIs (extractFields, listPresentFields, protocolHierarchy) that enable querying packet/frame data from dissected captures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/wiregasm/lib.cpp`:
- Around line 1097-1099: When wtap_seek_read(cfile->provider.wth,
fdata->file_off, &rec, &rec_buf, &err, &err_info) returns false, the loop
currently breaks without setting response.error causing callers to get
incomplete data; modify the failure path in the block around wtap_seek_read (and
the analogous blocks that call wtap_seek_read at the other two locations) to
populate response.error with the available error information (use err and/or
err_info contents or formatted message) and return/exit early instead of
silently breaking so the caller receives the read failure details.
- Around line 1281-1289: The protocol node counts are being incremented multiple
times per single frame because every FT_PROTOCOL hfinfo match increments
protocol->frames and protocol->bytes; fix this by preventing duplicate
increments within the same frame — e.g., add a per-protocol last-seen frame
marker (a field like ProtocolNode::last_seen_frame_id or last_frame_index) and
update it when you first account for that protocol in the current frame; in the
block where you get ProtocolNode* protocol via wg_find_or_create_protocol_node,
only increment protocol->frames and protocol->bytes if
protocol->last_seen_frame_id != current_frame_id (then set last_seen_frame_id =
current_frame_id), or alternatively maintain a per-frame set of counted protocol
pointers and skip duplicates, ensuring the logic uses the same current frame
identifier used elsewhere in this code path.
- Around line 1025-1044: FieldValue.raw is never set, so populate it with the
original string representation: when finfo->value is present call
fvalue_to_string_repr as you do, capture the returned char* into a temporary
std::string (e.g. std::string repr(display)), assign field_value.value = repr
(or move), and also assign field_value.raw = repr before calling wmem_free(NULL,
display); and in the label fallback branch set field_value.raw =
field_value.value (or the label string) so raw is filled for every pushed
FieldValue; update references to field_value, fvalue_to_string_repr, and
values->push_back accordingly.

In `@src/index.ts`:
- Around line 271-274: The code currently constructs a RegExp from untrusted
query (const regex = new RegExp(query)) and tests field.field/field.name, which
fails for inputs like (?i)http and is vulnerable to ReDoS; replace the regex
approach with a case-insensitive substring match: normalize the query to a
string and lower-case it (e.g., const q = String(query).toLowerCase()) and
filter using field.field.toLowerCase().includes(q) ||
field.name.toLowerCase().includes(q) so no RegExp is compiled and matching is
safe and case-insensitive.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fb7fba74-67c0-4e9c-9342-508a2a30bfec

📥 Commits

Reviewing files that changed from the base of the PR and between fa9a0ed and 82e8070.

📒 Files selected for processing (8)
  • lib/wiregasm/bindings.cpp
  • lib/wiregasm/lib.cpp
  • lib/wiregasm/lib.h
  • lib/wiregasm/wiregasm.cpp
  • lib/wiregasm/wiregasm.h
  • src/index.test.ts
  • src/index.ts
  • src/types.ts

Comment thread lib/wiregasm/lib.cpp
Comment on lines +1025 to +1044
FieldValue field_value;
field_value.field = field_it->second;

if (finfo->value) {
char *display = fvalue_to_string_repr(NULL, finfo->value, FTREPR_DISPLAY, finfo->hfinfo->display);
if (display) {
field_value.value = display;
wmem_free(NULL, display);
}
}

if (field_value.value.empty()) {
char label_str[ITEM_LABEL_LENGTH];
label_str[0] = '\0';
proto_item_fill_label(finfo, label_str);
field_value.value = string(label_str);
}

values->push_back(field_value);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

FieldValue.raw is never populated.

Line 1025–1044 fills field/value only. raw remains empty for all results, which makes the exposed response shape misleading.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/wiregasm/lib.cpp` around lines 1025 - 1044, FieldValue.raw is never set,
so populate it with the original string representation: when finfo->value is
present call fvalue_to_string_repr as you do, capture the returned char* into a
temporary std::string (e.g. std::string repr(display)), assign field_value.value
= repr (or move), and also assign field_value.raw = repr before calling
wmem_free(NULL, display); and in the label fallback branch set field_value.raw =
field_value.value (or the label string) so raw is filled for every pushed
FieldValue; update references to field_value, fvalue_to_string_repr, and
values->push_back accordingly.

Comment thread lib/wiregasm/lib.cpp
Comment on lines +1097 to +1099
if (!wtap_seek_read(cfile->provider.wth, fdata->file_off, &rec, &rec_buf, &err, &err_info)) {
break;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Surface read failures instead of returning partial success silently.

On Line 1097, Line 1213, and Line 1319, read failures break the loop but never populate response.error, so callers can receive incomplete data with an empty error string.

Suggested fix
-    if (!wtap_seek_read(cfile->provider.wth, fdata->file_off, &rec, &rec_buf, &err, &err_info)) {
-      break;
-    }
+    if (!wtap_seek_read(cfile->provider.wth, fdata->file_off, &rec, &rec_buf, &err, &err_info)) {
+      response.error = err_info ? string(err_info) : "Failed to read packet data";
+      break;
+    }

Also applies to: 1213-1215, 1319-1321

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/wiregasm/lib.cpp` around lines 1097 - 1099, When
wtap_seek_read(cfile->provider.wth, fdata->file_off, &rec, &rec_buf, &err,
&err_info) returns false, the loop currently breaks without setting
response.error causing callers to get incomplete data; modify the failure path
in the block around wtap_seek_read (and the analogous blocks that call
wtap_seek_read at the other two locations) to populate response.error with the
available error information (use err and/or err_info contents or formatted
message) and return/exit early instead of silently breaking so the caller
receives the read failure details.

Comment thread lib/wiregasm/lib.cpp
Comment thread src/index.ts Outdated
Avoid compiling untrusted present-field queries as regex by using case-insensitive substring matching with (?i) compatibility, and prevent protocol frames/bytes over-counting when a protocol appears multiple times in a single frame.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
lib/wiregasm/lib.cpp (1)

1272-1300: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Per-frame protocol de-duplication still resets across recursion branches.

counted_protocols is recreated on every recursive call. If the same protocol is reached twice under the same parent via different non-protocol subtrees in one frame, frames/bytes can still be incremented twice. Carry the same set through same-level recursion and only start a fresh one when descending into a child protocol's children.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/wiregasm/lib.cpp` around lines 1272 - 1300, The counted_protocols set is
recreated per call causing duplicate counting across different branches at the
same hierarchy level; modify wg_collect_protocol_hierarchy to accept and
propagate a reference (or pointer) to a counted_protocols set for the current
sibling-level so that when iterating siblings you reuse the same set, and only
create/pass a fresh set when recursing into a protocol node's children (i.e.,
when calling wg_collect_protocol_hierarchy for protocol->children); update all
call sites accordingly so counting (protocol->frames and protocol->bytes) is
guarded by insertion into this shared per-level set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/wiregasm/lib.cpp`:
- Around line 1127-1142: The current loop breaks when response.rows reaches
limit which leaves metadata (matched and response.total_rows) reflecting only
the scanned prefix; instead, continue scanning all packets after hitting limit
but stop appending rows: when response.rows.size() >= limit set
response.truncated = true and continue processing (calling wtap_rec_reset(&rec)
and epan_dissect_reset(&edt) each iteration) so matched and any counters are
still updated for every record, and only suppress pushing additional rows into
response.rows; adjust the logic around the block that currently breaks to skip
push but not break so response.total_rows and matched reflect the full scan
while preserving limit behavior.
- Around line 1172-1183: The occurrences counter is only incremented when the
field wasn't already seen in the current frame because the increment is guarded
by frame_fields->insert(...). Change the logic in the block handling field
sightings (around frame_fields, present_fields, field_name and finfo->hfinfo in
lib.cpp) so that you always increment PresentField.occurrences whenever the
field is encountered: if present_fields does not contain field_name, create a
PresentField and set occurrences = 1; otherwise increment it
(it->second.occurrences++). Keep the existing frame_fields insert to suppress
per-frame duplicates for whatever purpose it serves, but do not gate the
occurrences increment on that insert.

---

Duplicate comments:
In `@lib/wiregasm/lib.cpp`:
- Around line 1272-1300: The counted_protocols set is recreated per call causing
duplicate counting across different branches at the same hierarchy level; modify
wg_collect_protocol_hierarchy to accept and propagate a reference (or pointer)
to a counted_protocols set for the current sibling-level so that when iterating
siblings you reuse the same set, and only create/pass a fresh set when recursing
into a protocol node's children (i.e., when calling
wg_collect_protocol_hierarchy for protocol->children); update all call sites
accordingly so counting (protocol->frames and protocol->bytes) is guarded by
insertion into this shared per-level set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b16539c8-05d7-4fc9-b992-c31de6ef0625

📥 Commits

Reviewing files that changed from the base of the PR and between 82e8070 and 4e718c5.

📒 Files selected for processing (2)
  • lib/wiregasm/lib.cpp
  • src/index.ts

Comment thread lib/wiregasm/lib.cpp
Comment on lines +1127 to +1142
if (!row.values.empty()) {
response.rows.push_back(row);
if (limit > 0 && response.rows.size() >= limit) {
response.truncated = true;
wtap_rec_reset(&rec);
epan_dissect_reset(&edt);
break;
}
}
}

wtap_rec_reset(&rec);
epan_dissect_reset(&edt);
}

response.total_rows = response.rows.size();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep scanning after hitting limit so the metadata stays accurate.

Breaking here makes matched and total_rows describe only the scanned prefix, not the full result set. With truncated = true, callers will still expect the untruncated counts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/wiregasm/lib.cpp` around lines 1127 - 1142, The current loop breaks when
response.rows reaches limit which leaves metadata (matched and
response.total_rows) reflecting only the scanned prefix; instead, continue
scanning all packets after hitting limit but stop appending rows: when
response.rows.size() >= limit set response.truncated = true and continue
processing (calling wtap_rec_reset(&rec) and epan_dissect_reset(&edt) each
iteration) so matched and any counters are still updated for every record, and
only suppress pushing additional rows into response.rows; adjust the logic
around the block that currently breaks to skip push but not break so
response.total_rows and matched reflect the full scan while preserving limit
behavior.

Comment thread lib/wiregasm/lib.cpp
Comment on lines +1172 to +1183
if (frame_fields->insert(field_name).second) {
auto it = present_fields->find(field_name);
if (it == present_fields->end()) {
PresentField present;
present.field = field_name;
present.name = finfo->hfinfo->name ? finfo->hfinfo->name : "";
present.type = static_cast<int>(finfo->hfinfo->type);
present.occurrences = 1;
(*present_fields)[field_name] = present;
} else {
it->second.occurrences++;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

occurrences is currently counting frames, not occurrences.

frame_fields->insert(...) suppresses repeated instances inside the same packet, so fields that appear multiple times in one frame still add only 1. That makes the exported occurrences value inaccurate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/wiregasm/lib.cpp` around lines 1172 - 1183, The occurrences counter is
only incremented when the field wasn't already seen in the current frame because
the increment is guarded by frame_fields->insert(...). Change the logic in the
block handling field sightings (around frame_fields, present_fields, field_name
and finfo->hfinfo in lib.cpp) so that you always increment
PresentField.occurrences whenever the field is encountered: if present_fields
does not contain field_name, create a PresentField and set occurrences = 1;
otherwise increment it (it->second.occurrences++). Keep the existing
frame_fields insert to suppress per-frame duplicates for whatever purpose it
serves, but do not gate the occurrences increment on that insert.

Upstream zlib.net re-published the 1.2.11 tarball with a different hash,
breaking the glib subproject download. Update the expected source_hash
and add a source_fallback_url pointing to the GitHub release mirror.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant