ISSUE-570: Enhance rayhunter-check output#941
ISSUE-570: Enhance rayhunter-check output#941herbenderbler wants to merge 1 commit intoEFForg:mainfrom
Conversation
51b9ca7 to
ea1437c
Compare
2987df5 to
c24e3d8
Compare
| .expect("failed to create output directory"); | ||
| } | ||
|
|
||
| let mut harness = Harness::new_with_config(&AnalyzerConfig::default()); |
There was a problem hiding this comment.
Moving Harness::new_with_config up to main() means the same harness gets reused for every file in the WalkDir. packet_num just counts up, so the (packet N) annotations on events in file 2 reference the running count from file 1, not the current file.
The analyzers carry state too. ImsiRequestedAnalyzer (and the other stateful ones) hold state, timeout_counter, and flag across calls with no reset, so a recording that ends mid-transition will bleed that into the first packets of the next file. Events can fire spuriously or get suppressed depending on WalkDir's iteration order.
The daemon still constructs a fresh Harness per file via AnalysisWriter::new, so this is specific to the rayhunter-check directory walk that reanalyzing.md is calling out.
Easiest fix is to put Harness::new_with_config back inside analyze_qmdl/analyze_pcap and pass AnalyzerConfig instead of &mut Harness. The metadata print in main() can stay since it's a one-shot.
| async fn analyze_pcap(pcap_path: &str, show_skipped: bool) { | ||
| let mut harness = Harness::new_with_config(&AnalyzerConfig::default()); | ||
| let pcap_file = &mut File::open(&pcap_path).await.expect("failed to open file"); | ||
| fn output_path(output_dir: &Path, input_path: &str, extension: &str) -> PathBuf { |
There was a problem hiding this comment.
output_path() can clobber its own output in a couple ways when --path is a directory.
-
file_stem()drops the directory context.a/capture.qmdlandb/capture.qmdlboth resolve to<output>/capture.ndjson, andsession.qmdl+session.pcapin the same dir collide onsession.ndjson. -
with_extensionreplaces the segment after the last dot of the joined filename, not the full extension of the original file. A stem likerecording.v2becomes a PathBuf whose extension isv2, andwith_extension("ndjson")turns it intorecording.ndjson. Timestamped names do the same thing,2026-01-02_10.05.00_capture.qmdlcomes out as2026-01-02_10.05.ndjson.
File::create truncates, so later writes silently overwrite earlier ones. Same helper is used for .pcapng at line 249 so pcapify has the same problem.
Two changes needed. Build the filename with output_dir.join(format!("{stem}.{extension}")) instead of with_extension to stop it from eating dots in the stem, and either preserve the relative path under output_dir or detect existing targets so cross-directory collisions don't silently overwrite.
There was a problem hiding this comment.
Should the output filename should be <stem>.<ext> (current intent) or <original_filename>.<ext> (e.g. capture.qmdl.ndjson)?
There was a problem hiding this comment.
<original_filename>.<ext> makes more sense here. capture.qmdl becomes capture.qmdl.ndjson, capture.pcap becomes capture.pcap.ndjson, and the session.qmdl + session.pcap collision drops out for free. It also survives dotted stems like 2026-01-02_10.05.00_capture.qmdl without chopping them. Implementation collapses to output_dir.join(format!("{input_name}.{extension}")) where input_name is Path::file_name() of the input, no with_extension needed.
The daemon's <name>.ndjson convention doesn't really transfer since daemon recording names don't have extensions to begin with. rayhunter-check is reading arbitrary user-supplied paths, so naming has to cope with whatever shows up.
Cross-directory collisions (a/capture.qmdl and b/capture.qmdl both writing to capture.qmdl.ndjson) are still a thing. Simplest to check if the target exists and bail rather than invent a suffix scheme, but either works.
There was a problem hiding this comment.
+1 this is the correct take I think.
9ecd300 to
8c6c630
Compare
|
I haven't reviewed the code yet but two things on functionality:
|
d33f160 to
4789ae6
Compare
I need some clarification on this piece. I implemented "write to both stdout and the file" when both --format json and --output are set (so --output adds a durable copy, not a redirect). If you actually meant --output should replace stdout in that case, let me know and I'll toggle that change. |
e5a8650 to
0a959cc
Compare
|
Nope, you got it right, that is the functionality I intended. Thanks! |
Awesome! 🎉 Is there anything else I can or should do to land this PR? |
--format json streams NDJSON to stdout; --output also writes a file copy and is still required for --pcapify. Per-input Harness, no overwrites, --show-skipped opts skipped rows in. NdjsonWriter shared with the daemon via a new with_writer() over any AsyncWrite.
0a959cc to
07a7909
Compare
Please address this final issue, other than that I think this is ready. |
Done. I accidentally killed this PR, but my fix for this is in #1015. Sorry for the hassle 😅 |
Summary
Implements #570: adds NDJSON output to
rayhunter-checkthat matches the daemon format, with file generation opt-in via--output.Changes
--format jsonwrites one NDJSON file per input (when-o <dir>is set). Same schema as daemon analysis output.--output <dir>controls where files are written; required for--format jsonand--pcapify.NdjsonWriterin lib used by both daemon and check.Checklist
cargo fmtandcargo clippyrun(Optional) Note: One
Harnessis reused for the entire run; packet numbering in debug output is global.Pull Request Checklist
cargo fmt.You must check one of: