Skip to content

fix(inspector): route artifacts to format-specific extractors (#15)#45

Merged
AK11105 merged 1 commit into
mainfrom
fix/15-inspector-format-routing
May 20, 2026
Merged

fix(inspector): route artifacts to format-specific extractors (#15)#45
AK11105 merged 1 commit into
mainfrom
fix/15-inspector-format-routing

Conversation

@AK11105
Copy link
Copy Markdown
Owner

@AK11105 AK11105 commented May 19, 2026

Summary

Replaces the monolithic pickle-only inspector script with format-specific extractors. Fixes crashes on .onnx and .safetensors files (UnpicklingError) and missing state dict / layer metadata for .pt files.

Changes

  • app/cli/core/inspector.py — format detection via extension + magic bytes; TorchExtractor (weights_only=True), OnnxExtractor,
    SafetensorsExtractor (header-only), DirectoryExtractor (JSON reads only), PickleExtractor (joblib → pickle), GenericExtractor fallback;
    subprocess always exits 0; adds raw_facts, confidence, inspection_errors to ArtifactMetadata
  • pyproject.toml — adds torch, onnx, safetensors, frameworks optional dep groups
  • .github/workflows/ci.yml — installs all optional extras in CI; adds uv cache step
  • tests/test_cli_issue15_inspector.py — 24 new tests covering all extractors, graceful fallback, and new metadata fields; pytest.importorskip
    guards on all framework-specific tests
  • docs/cli/deploy.md — updated step 1 description and supported formats table

Type

  • Bug fix

Testing

  • pytest passes
  • Coverage ≥ 70% (successful coverage even after changes)
  • Tested manually (.pt state dict, .onnx with dynamic axes, .safetensors, HuggingFace directory, bad bytes graceful fallback)

Related Issues

Closes #15

- Add format detection via extension + magic bytes before any load attempt
- Add TorchExtractor: torch.load(..., weights_only=True), extracts state
  dict keys, param count, or layer names for full models
- Add OnnxExtractor: onnx.load, extracts opset, op types, inputs/outputs
  with dim_param preserved for dynamic axes
- Add SafetensorsExtractor: header-only read via safe_open, extracts
  tensor keys, shapes, metadata
- Add DirectoryExtractor: JSON reads only (config.json,
  tokenizer_config.json), no model loading
- PickleExtractor: joblib.load first, fallback to pickle.load
- GenericExtractor: fallback for unknown extensions
- Subprocess always exits 0; errors captured per-layer in raw[errors]
- Add raw_facts, confidence, inspection_errors to ArtifactMetadata
- Add format-specific optional dep groups to pyproject.toml
- Add tests covering all new extractors and graceful fallback paths
- Install optional extras in CI with uv cache; importorskip guards on
  all framework-specific tests
- Update docs/cli/deploy.md: replace pickle-centric framework table with
  format/extractor table; fix step 1 description
@AK11105 AK11105 requested a review from atkaridarshan04 May 19, 2026 05:36
@AK11105 AK11105 self-assigned this May 19, 2026
@AK11105 AK11105 merged commit f084880 into main May 20, 2026
10 checks passed
@AK11105 AK11105 deleted the fix/15-inspector-format-routing branch May 20, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inspector uses pickle.load for all artifact formats, crashing on ONNX and safetensors

2 participants