Last updated: 2026-05-05
Tooling reference for inspecting workflows, validating shipped JSONs, and
correlating runtime artifacts. Symptom-first quality troubleshooting
(why does the video look wrong) lives in docs/guides/debugging_guide.md.
Read-only against any workflow JSON; none mutate state.
| Script | Purpose | Trigger |
|---|---|---|
scripts/audit_workflows.py [--verbose] |
Health audit across all example_workflows/: sage, batch-encode, sigma chain, resolution, (L-1)%8, preprocess, decoder, F2/F3 symmetry, plus 3 generic audit invariants (cycle / widget shape / link integrity) plus 1 AST test (cond-metadata types). Exits 1 on ERR. |
After bulk edits, before commit, in CI. |
scripts/analyze_workflow_dag.py <wf> --format <ascii|mermaid|dot|json> [--save-run] |
Topo-sorted execution order + graph rendering. --save-run lands the artifact under data/runs/${RUN_ID}/dag_<slug>.<ext> (correlates with sibling run logs) or data/runs/dag/dag_<slug>_<ts>.<ext> if RUN_ID unset. |
Diagnosing execution order, post-validate cycle suspicion, comparing iteration-state across two workflows. |
scripts/trace_node_source.py <wf> <id> --include-inputs |
Resolve any node to AST-extracted source + wiring. Flags object_patches, captured tensors, bypasses, widget overrides. |
Before trusting any widget annotation, before assuming bypass is inert. |
scripts/sage_telemetry_summary.py --sage-log <path> [--exec-log <path>] |
Outside-ComfyUI aggregator. Per-(kernel, mask) median/p90/count + Phase 0 gate verdict. Reads only; does not write. | After a traced render, when comparing kernel routing. |
scripts/verify_sage_iteration_trace.sh |
Diff per-iter sage kernel counts. | Suspecting per-iter kernel-routing drift. |
scripts/diagnose_overlap_seams.py --latent <path> --iteration-count N --window-latents W --overlap-latents O |
Per-frame ghost-residual scan |f[t] - (f[t-1] + f[t+1]) / 2| (inverted, normalized) on an assembled loop output latent. Reports top-K ghost-scoring frames, per-seam-band scores at each iteration-boundary, and a noise-floor baseline. CPU-only; reads saved latent tensors or video files. |
Investigating iteration-boundary artifacts (seam ghosting, blend-flicker). Gating evidence before building a corrective seam-zone pass. |
audit_workflows.py is intentionally WorkflowEditor-independent —
raw orjson.loads + inline link scans. Debug tools must stay usable when
the editor they audit has a bug; don't DRY against WorkflowEditor.
audit_workflows.py runs in CI on every push. New invariants land here
paired with their apply scripts. Two flavors:
Named pattern checks (one per known fix):
| Check | Pairs with apply script | What it ERRs on |
|---|---|---|
sage / sage_mode / sage_active |
scripts/archive/apply_sage_mode.py |
Missing or non-auto_mask_aware AudioLoopHelperSageAttention |
iteration_stamp |
scripts/archive/apply_iteration_stamp.py |
Missing LoopIterationStamp |
preprocess_symmetry (F2) |
apply_loop_guide_preprocess_symmetry.py |
Loop guide branch skips LTXVPreprocess |
loop_cropguides_symmetry (F3) |
apply_loop_cropguides_symmetry.py |
Loop CFGGuider not via LTXVCropGuides |
alc_seed_legacy_name (F4) |
apply_alc_seed_rename.py |
AudioLoopController has legacy seed/noise_seed input |
iterations_autowired (F5) |
apply_iterations_autowire.py |
TensorLoopOpen.iterations_in not from AudioLoopPlanner.total_iterations |
alc_widget_drift (F6) |
apply_strip_alc_control_after_generate.py |
AudioLoopController widgets_values has stale 6th 'randomize' entry |
planner_no_stride_input (F7) |
apply_planner_break_stride_cycle.py |
AudioLoopPlanner has legacy stride_seconds input (closes a cycle) |
dead_lora_loader_scaffolding_absent (F11) |
apply_strip_dead_lora_loaders.py |
Bypassed #1625/#1626/#1627 LoRA scaffolding nodes (inert UI clutter) still in canonical |
iclora_video_reference_guide_in_loop_with_cropguides (F12a) |
apply_iclora_video_reference.py |
In-loop LTXAddVideoICLoRAGuide CONDITIONING outputs feed CFGGuider directly (must pass through LTXVCropGuides[NoLatent]) |
iclora_loader_present_when_guide_present (F12b) |
apply_iclora_video_reference.py |
Subgraph has IC-LoRA guide but top-level has no LTXICLoRALoaderModelOnly |
iclora_ref_video_preprocess_symmetry (F12c) |
apply_iclora_video_reference.py |
IC-LoRA guide present but no LTXVPreprocess(val=18) on the ref-video chain |
model_sampling_shift (F13) |
apply_strip_sd3_shift_node.py |
ModelSamplingSD3 present and active on a distilled workflow (Lightricks's distilled inference applies no shift; the SD3 node distorts the sigma-to-timestep mapping). WARN-level. |
trim_video_latent_to_audio_present + trim_image_batch_to_audio_present (F14, layered) |
apply_trim_video_latent_to_audio.py + apply_trim_image_batch_to_audio.py |
Loop workflows must have BOTH trims wired. Latent trim (pre-VAE-decode) snaps UP to the smallest LTX-valid count where decoded pixels ≥ int(audio*fps) — saves decode VRAM/time on overshoot frames. Image trim (post-decode) clips the 0-7 pixel-frame residue from snap-UP to exact audio length. Without the layered pair, ffmpeg -shortest either clips audio (when video < audio) or leaves silence at end (when video > audio and -c:v copy defeats -shortest). Reverted to layered architecture on 2026-05-10 after Option A (latent-only) caused user-reported audio clipping. Postmortem: internal/analysis/loop_audio_overshoot_analysis.md (private clone only). |
run_id_layout_present (F15) |
apply_run_id_layout.py |
Loop workflow's VHS_VideoCombine.filename_prefix not fed by RunIdPrefix.video_prefix. WARN-level — without it, every render's mp4 + workflow-snapshot + audio-mux outputs spray flat with a global counter instead of clustering under <output>/<workflow_name>/<timestamp>/. Same apply script also adds a bypassed SaveLatent toggle wired from LatentConcat #1605 for the LoadLatent upscale path. User guide: guides/upscale_guide.md. |
Generic structural invariants (catch CLASSES of drift without per-bug rules):
| Check | Catches |
|---|---|
graph_acyclic |
Top-level dependency cycles. ComfyUI rejects with "Dependency cycle detected" before any node executes. |
widget_shape |
Stray randomize/fixed/increment/decrement strings in widgets_values of nodes that don't legitimately have a control_after_generate dropdown. Catches partial schema migrations. |
link_integrity |
Top-level link record vs node-level link references desync (slot out of range, source's outputs[].links doesn't list the link id, target's inputs[].link != id). Plus subgraph linkIds references to non-existent links. |
layout_no_orphans |
Non-Note node at pos=[0, 0]. Catches the silent failure mode where an apply script inserts a node and never runs a layout pass — node lands at canvas origin and is hard to spot in a busy workflow. Allowlisted types: Note only. |
(no audit; AST test) tests/test_node_schemas.py::test_keyframe_idxs_cleared_to_none_not_empty_list |
conditioning_set_values({"keyframe_idxs": []}) literal-list assignments. KJNodes' OuterSampleCallbackWrapper crashes on empty-list keyframe_idxs. |
Bake new topology constraints into audit_workflows.py. Every fix
that ships an apply script should ship a matching audit check (ERR with a
Run scripts/apply_X.py remediation pointer). Prevents silent regression
of fixes a sibling branch might revert.
Workflow migrations live in scripts/apply_*.py. Each script:
- Default: mutates
example_workflows/audio-loop-music-video_latent.jsonin place (accepts an optional path arg). - Idempotent:
md5sumbefore + after re-run must match. Guard withif _is_already_built(wf): returnto avoid burninglast_node_idon strip-then-readd. - Has
--revertthat restores the pre-fix shape. - Has
--dry-runthat reports what WOULD change without writing. Pair withaudit_workflows.pyto verify a hypothetical state (HyDE).
Three-tier staging:
internal/scratch/— exploratory, gitignored.example_workflows/experimental/— cross-machine reviewable; opt-in to audit viaEXPERIMENTAL_AUDITED_FILESallowlist inaudit_workflows.py.example_workflows/— production, "ships AND stabilizes" perinternal/PLAN.md(private clone only).
POCs that intentionally break a production invariant (e.g. F3 asymmetry)
ship a paired audit check that dispatches on a node-title prefix and ERRs
only if the rewire is damaged. Canonical TTC1 pair:
apply_ttc_init_guide_amplification_poc.py + ttc1_init_guide_amplification.
Scratch-build apply scripts use WorkflowEditor.from_scratch(output_path)
add_top_level_node+add_link— returns an empty-skeleton editor with fresh uuid + resetlast_node_id/last_link_id. Canonical:scripts/apply_spectrogram_iclora_minimal.py.
Shared apply-script helpers live in scripts/_helpers/_apply_helpers.py
(add_link, find_node, remove_node_and_links, find_link_to_slot,
next_id). Import with aliases to preserve call-site names; don't
re-define inline.
Sweep orphan virtual GetNodes after fork-and-strip. A GetNode whose
widgets_values[0] matches no live SetNode is orphaned; ComfyUI tolerates
it at runtime but it clutters the graph. Add the ID to STRIP_IDS.
Detect via:
[n["id"] for n in wf["nodes"] if n["type"]=="GetNode"
and not (n.get("outputs",[{}])[0].get("links") or [])]Templates: scripts/templates/apply_script_all_workflows.py (in-place
edits) and scripts/templates/apply_script_staged_variant.py
(experimental staging). Both include the canonical --revert,
--dry-run, idempotence, and require_nodes guards.
Selected staged-variant apply scripts (stage drafts under
internal/workflows/; promote to example_workflows/experimental/ after
A/B validation):
| Script | Stages to | What it does |
|---|---|---|
apply_lanczos_init_preprocess.py |
loop_with_lanczos_preprocess.draft.json |
Inserts a supersample-then-decimate ImageResizeKJv2 pair in front of the init-image resize. Targets residual aliasing on faces / fine textures when the source image is much larger than the schedule target dims. Idempotent, --revert, --dry-run. |
apply_p3_retake_edit_lora.py |
retake_edit.draft.json |
Wires the section-targeted retake-edit pattern into a copy of the canonical retake workflow: LTXICLoRALoaderModelOnly (edit-anything LoRA — ADD / REMOVE / REPLACE / RESTYLE) into the MODEL chain, plus LTXVAddGuideMulti (strength=1, frame_idx=0) between LatentTemporalMask and SamplerCustomAdvanced. Existing positive CLIPTextEncode becomes the edit instruction. Idempotent, --revert, --dry-run. |
Scratch-build new workflows from constants — distinct from apply scripts (which mutate or stage variants of an existing canonical workflow).
| Script | Builds | Topology |
|---|---|---|
scripts/build_keyframe_workflow.py |
(per script) | Keyframe-schedule baseline. |
scripts/build_upscale_workflow.py |
internal/workflows/upscale_loop_output.draft.json |
Post-loop spatial upscale: LoadLatent → LTXVLatentUpsampler (2×) → LTXVConcatAVLatent → SamplerCustomAdvanced (3-step σ-tail [0.85, 0.7250, 0.4219, 0.0], euler, CFG=1) → LTXVSeparateAVLatent → LTXVCropGuides → LTXVTiledVAEDecode → VHS_VideoCombine. Audio from LoadAudio of source mp3 (no mp4 needed). Empty audio latent sized via LatentFrameCount.pixel_frames. Model chain mirrors the canonical loop's perf/VRAM patches (UNETLoader → AudioLoopHelperSageAttention → LTXVChunkFeedForward → LTX2AttentionTunerPatch → CFGGuider). 27 nodes, 32 links; constants centralized at the top of the script. --dry-run, --revert. Pre-step: run loop with scripts/apply_save_assembled_latent.py applied so the assembled .latent file exists; move into ComfyUI's input dir. Chain apply_trim_video_latent_to_audio.py after re-building so the F14 latent-trim gets re-spliced. |
Build scripts share apply-script conventions for --dry-run / --revert
/ idempotence, but produce a deterministic file from constants rather
than editing an existing one. Re-running overwrites with byte-identical
output (constant node ids, deterministic link order).
| Source | Default path | RUN_ID path | What it captures |
|---|---|---|---|
Exec log (COMFYUI_EXEC_LOG=... env var enables) |
internal/analysis/runs/exec_log/exec_<ts>.jsonl |
data/runs/${RUN_ID}/exec.jsonl |
Per-node start/end with class_type, inputs, duration |
Sage trace (AUDIOLOOPHELPER_SAGE_TRACE=auto) |
internal/analysis/runs/sage/sage_<ts>.jsonl |
data/runs/${RUN_ID}/sage.jsonl |
Per-(kernel, mask) timing for sage attention |
| Profiler | internal/analysis/runs/profiler/<ts>/ |
data/runs/${RUN_ID}/profiler/ |
torch profiler trace.json + summary.txt + memory_timeline.html |
| ComfyUI stdout | <comfyui>/user/comfyui_8188.log (and .prev.log, .prev2.log) |
(same) | Validation errors, prompt accepted/rejected, exception tracebacks |
| Output mp4 | (raw output dir) | data/runs/${RUN_ID}/output.mp4 (symlink) |
The rendered video |
AUDIOLOOPHELPER_SAGE_TRACE=auto is default in start_experiment.sh
at this repo's root. Plain <comfyui>/start.sh does NOT export it. Run
via start_experiment.sh for traced launches.
docs/reference/telemetry_and_tracing.md covers what each tracer captures
(and doesn't), retention, on/off semantics, why prompt text can leak via
the exec logger but not via the sage tracer.
docs/reference/environment.md — env-var registry, single-helper-call-site
DRY rule.
Conventions (post-2026-04-26 RUN_ID propagation):
- New per-render artifacts:
data/runs/${RUN_ID}/<category>.<ext>viascripts/workflow_utils.py::run_artifact_path(category, ext). - Multi-file artifacts:
data/runs/${RUN_ID}/<subdir>/viarun_artifact_dir(subdir). - Without RUN_ID: legacy fallback to
internal/analysis/runs/for most loggers.analyze_workflow_dag.py --save-runis the exception — always writes underdata/(data/runs/${RUN_ID}/dag_<slug>.<ext>with RUN_ID,data/runs/dag/dag_<slug>_<ts>.<ext>without).
The single env-var read site for RUN_ID is
scripts/workflow_utils.py::_current_run_id. Route all reads through it.
Rendered mp4 lands at data/runs/${RUN_ID}/output.mp4 (symlink) once
internal/autoresearch/harness.py::_locate_and_link_output_mp4 runs
after poll_until_done. Source dir comes from COMFYUI_OUTPUT_DIR env
var (no hardcoded paths). Filename constant lives at
internal/autoresearch/metrics/__init__.py::OUTPUT_MP4_FILENAME — every
video-content metric (subject_consistency, av_consistency, future
style/aesthetic/palette) imports it; never inline "output.mp4" in a new
metric module.
When a workflow fails to run, work in this order before going inline:
-
Tail the ComfyUI log for the most recent prompt:
tail -200 <comfyui>/user/comfyui_8188.logValidation errors ("Dependency cycle detected", "Failed to convert ... to a INT value") fail before any node executes — log shows them. Exception tracebacks fire mid-run and point you at the offending node.
-
Run the audit:
uv run --group dev python scripts/audit_workflows.pyERRs map 1:1 to remediation apply scripts. The 4 generic checks (
graph_acyclic,widget_shape,link_integrity,cond_metadata_typesAST test) catch classes of drift. -
Inspect execution order if a topology change is suspected:
uv run --group dev python scripts/analyze_workflow_dag.py \ example_workflows/audio-loop-music-video_latent.json \ --format ascii --save-runCompare against a known-good baseline diff.
-
Trace the suspect node for source + wiring:
uv run --group dev python scripts/trace_node_source.py <wf> <node-id> \ --include-inputs -
Cross-reference the exec log for what actually ran most recently:
ls -lt internal/analysis/runs/exec_log/ | head(Or
data/runs/${RUN_ID}/exec.jsonlif the run had RUN_ID.)
Skipping step 1 wastes the most time — ComfyUI's log usually identifies the failure class within the first matching line.
Iter-over-iter drift specifically: trace CONDITIONING paths in
parallel (initial vs loop). Asymmetries (missing LTXVConditioning,
frame_rate mismatch, CLIP in subgraph) are load-bearing bugs.