Skip to content

ISSUE-570: Enhance rayhunter-check output#941

Closed
herbenderbler wants to merge 1 commit intoEFForg:mainfrom
herbenderbler:ISSUE-570/check-output
Closed

ISSUE-570: Enhance rayhunter-check output#941
herbenderbler wants to merge 1 commit intoEFForg:mainfrom
herbenderbler:ISSUE-570/check-output

Conversation

@herbenderbler
Copy link
Copy Markdown

@herbenderbler herbenderbler commented Mar 8, 2026

Summary

Implements #570: adds NDJSON output to rayhunter-check that matches the daemon format, with file generation opt-in via --output.

Changes

  • --format json writes one NDJSON file per input (when -o <dir> is set). Same schema as daemon analysis output.
  • --output <dir> controls where files are written; required for --format json and --pcapify.
  • Shared NdjsonWriter in lib used by both daemon and check.
  • doc/reanalyzing.md updated with new options and stderr note.

Checklist

  • Documentation updated (doc/reanalyzing.md)
  • cargo fmt and cargo clippy run
  • CONTRIBUTING.md read
  • AI disclosure box checked

(Optional) Note: One Harness is reused for the entire run; packet numbering in debug output is global.

Pull Request Checklist

  • The Rayhunter team has recently expressed interest in reviewing a PR for this.
    • If not, this PR may be closed due our limited resources and need to prioritize how we spend them.
  • Added or updated any documentation as needed to support the changes in this PR.
  • Code has been linted and run through cargo fmt.
  • If any new functionality has been added, unit tests were also added.
  • CONTRIBUTING.md has been read.

You must check one of:

  • No generative AI (including LLMs) tools were used to create this PR.
  • Generative AI was used to create this PR. I certify that I have read and understand the code, and that all comments and descriptions were authored by myself and are not the product of generative AI.

@herbenderbler herbenderbler marked this pull request as ready for review March 8, 2026 06:05
@herbenderbler herbenderbler marked this pull request as draft March 8, 2026 18:56
@herbenderbler herbenderbler force-pushed the ISSUE-570/check-output branch from 51b9ca7 to ea1437c Compare March 12, 2026 05:47
@herbenderbler herbenderbler marked this pull request as ready for review March 12, 2026 06:32
@herbenderbler herbenderbler force-pushed the ISSUE-570/check-output branch from 2987df5 to c24e3d8 Compare March 12, 2026 06:34
@cooperq cooperq self-assigned this Apr 17, 2026
Comment thread check/src/main.rs Outdated
.expect("failed to create output directory");
}

let mut harness = Harness::new_with_config(&AnalyzerConfig::default());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving Harness::new_with_config up to main() means the same harness gets reused for every file in the WalkDir. packet_num just counts up, so the (packet N) annotations on events in file 2 reference the running count from file 1, not the current file.

The analyzers carry state too. ImsiRequestedAnalyzer (and the other stateful ones) hold state, timeout_counter, and flag across calls with no reset, so a recording that ends mid-transition will bleed that into the first packets of the next file. Events can fire spuriously or get suppressed depending on WalkDir's iteration order.

The daemon still constructs a fresh Harness per file via AnalysisWriter::new, so this is specific to the rayhunter-check directory walk that reanalyzing.md is calling out.

Easiest fix is to put Harness::new_with_config back inside analyze_qmdl/analyze_pcap and pass AnalyzerConfig instead of &mut Harness. The metadata print in main() can stay since it's a one-shot.

Comment thread check/src/main.rs
async fn analyze_pcap(pcap_path: &str, show_skipped: bool) {
let mut harness = Harness::new_with_config(&AnalyzerConfig::default());
let pcap_file = &mut File::open(&pcap_path).await.expect("failed to open file");
fn output_path(output_dir: &Path, input_path: &str, extension: &str) -> PathBuf {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_path() can clobber its own output in a couple ways when --path is a directory.

  1. file_stem() drops the directory context. a/capture.qmdl and b/capture.qmdl both resolve to <output>/capture.ndjson, and session.qmdl + session.pcap in the same dir collide on session.ndjson.

  2. with_extension replaces the segment after the last dot of the joined filename, not the full extension of the original file. A stem like recording.v2 becomes a PathBuf whose extension is v2, and with_extension("ndjson") turns it into recording.ndjson. Timestamped names do the same thing, 2026-01-02_10.05.00_capture.qmdl comes out as 2026-01-02_10.05.ndjson.

File::create truncates, so later writes silently overwrite earlier ones. Same helper is used for .pcapng at line 249 so pcapify has the same problem.

Two changes needed. Build the filename with output_dir.join(format!("{stem}.{extension}")) instead of with_extension to stop it from eating dots in the stem, and either preserve the relative path under output_dir or detect existing targets so cross-directory collisions don't silently overwrite.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the output filename should be <stem>.<ext> (current intent) or <original_filename>.<ext> (e.g. capture.qmdl.ndjson)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<original_filename>.<ext> makes more sense here. capture.qmdl becomes capture.qmdl.ndjson, capture.pcap becomes capture.pcap.ndjson, and the session.qmdl + session.pcap collision drops out for free. It also survives dotted stems like 2026-01-02_10.05.00_capture.qmdl without chopping them. Implementation collapses to output_dir.join(format!("{input_name}.{extension}")) where input_name is Path::file_name() of the input, no with_extension needed.

The daemon's <name>.ndjson convention doesn't really transfer since daemon recording names don't have extensions to begin with. rayhunter-check is reading arbitrary user-supplied paths, so naming has to cope with whatever shows up.

Cross-directory collisions (a/capture.qmdl and b/capture.qmdl both writing to capture.qmdl.ndjson) are still a thing. Simplest to check if the target exists and bail rather than invent a suffix scheme, but either works.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this is the correct take I think.

@herbenderbler herbenderbler force-pushed the ISSUE-570/check-output branch 3 times, most recently from 9ecd300 to 8c6c630 Compare April 22, 2026 23:33
@cooperq
Copy link
Copy Markdown
Collaborator

cooperq commented Apr 23, 2026

I haven't reviewed the code yet but two things on functionality:

  1. ndjson output shouldn't show skipped messages unless --show-skipped is enabled.
  2. ndjson output should still go to stdout, output directory should be optional (and if output is specified it should still output to stdout.
  3. You can fix the tests by merging against the latest main branch.

@herbenderbler herbenderbler force-pushed the ISSUE-570/check-output branch 2 times, most recently from d33f160 to 4789ae6 Compare April 23, 2026 18:16
@herbenderbler
Copy link
Copy Markdown
Author

2. ndjson output should still go to stdout, output directory should be optional (and if output is specified it should still output to stdout.

I need some clarification on this piece. I implemented "write to both stdout and the file" when both --format json and --output are set (so --output adds a durable copy, not a redirect). If you actually meant --output should replace stdout in that case, let me know and I'll toggle that change.

@herbenderbler herbenderbler force-pushed the ISSUE-570/check-output branch 2 times, most recently from e5a8650 to 0a959cc Compare April 23, 2026 18:28
@cooperq
Copy link
Copy Markdown
Collaborator

cooperq commented Apr 23, 2026

Nope, you got it right, that is the functionality I intended. Thanks!

@herbenderbler
Copy link
Copy Markdown
Author

Nope, you got it right, that is the functionality I intended. Thanks!

Awesome! 🎉 Is there anything else I can or should do to land this PR?

--format json streams NDJSON to stdout; --output also writes a file
copy and is still required for --pcapify. Per-input Harness, no
overwrites, --show-skipped opts skipped rows in. NdjsonWriter shared
with the daemon via a new with_writer() over any AsyncWrite.
@herbenderbler herbenderbler force-pushed the ISSUE-570/check-output branch from 0a959cc to 07a7909 Compare April 23, 2026 20:49
@cooperq
Copy link
Copy Markdown
Collaborator

cooperq commented May 1, 2026

output dir for pcapify should default to the same directory as the qmdl file as per the previous behavior.

Please address this final issue, other than that I think this is ready.

@herbenderbler herbenderbler closed this by deleting the head repository May 2, 2026
@herbenderbler
Copy link
Copy Markdown
Author

output dir for pcapify should default to the same directory as the qmdl file as per the previous behavior.

Please address this final issue, other than that I think this is ready.

Done. I accidentally killed this PR, but my fix for this is in #1015. Sorry for the hassle 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants