Add offline/air-gapped model resolution for inference-models cache#2187
Draft
Add offline/air-gapped model resolution for inference-models cache#2187
Conversation
- Write model_id into model_config.json so it can be recovered without
the auto-resolution-cache (which expires and gets deleted)
- scan_cached_models now also walks models-cache/{slug}/{package_id}/
model_config.json under both MODEL_CACHE_DIR and INFERENCE_HOME,
covering models pre-populated via inference-models without a
corresponding model_type.json in MODEL_CACHE_DIR
- Prune models-cache/ from the model_type.json walk to avoid noise
- De-duplicate results by model_id (layout-1 takes precedence)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When model_type.json is missing (e.g. models pre-populated via
inference-models without going through the registry), check
models-cache/{slug}/{package_id}/model_config.json under both
MODEL_CACHE_DIR and INFERENCE_HOME. This allows get_model_type()
to resolve cached models without hitting the Roboflow API,
enabling fully air-gapped model loading.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When model weights are already cached in the inference-models layout
(models-cache/{slug}/{package_id}/), pass the local directory path
to AutoModel.from_pretrained() instead of the model ID. This triggers
load_model_from_local_storage() which skips the API call entirely,
enabling air-gapped model loading without modifying the inference-models
package.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts: # inference/core/cache/air_gapped.py # inference/core/interfaces/http/builder/routes.py # inference/core/interfaces/http/handlers/workflows.py # inference/core/workflows/core_steps/models/foundation/anthropic_claude/v1.py # inference/core/workflows/core_steps/models/foundation/anthropic_claude/v2.py # inference/core/workflows/core_steps/models/foundation/anthropic_claude/v3.py # inference/core/workflows/core_steps/models/foundation/clip/v1.py # inference/core/workflows/core_steps/models/foundation/clip_comparison/v1.py # inference/core/workflows/core_steps/models/foundation/clip_comparison/v2.py # inference/core/workflows/core_steps/models/foundation/cog_vlm/v1.py # inference/core/workflows/core_steps/models/foundation/depth_estimation/v1.py # inference/core/workflows/core_steps/models/foundation/easy_ocr/v1.py # inference/core/workflows/core_steps/models/foundation/florence2/v1.py # inference/core/workflows/core_steps/models/foundation/florence2/v2.py # inference/core/workflows/core_steps/models/foundation/gaze/v1.py # inference/core/workflows/core_steps/models/foundation/google_gemini/v1.py # inference/core/workflows/core_steps/models/foundation/google_gemini/v2.py # inference/core/workflows/core_steps/models/foundation/google_gemini/v3.py # inference/core/workflows/core_steps/models/foundation/google_vision_ocr/v1.py # inference/core/workflows/core_steps/models/foundation/llama_vision/v1.py # inference/core/workflows/core_steps/models/foundation/lmm/v1.py # inference/core/workflows/core_steps/models/foundation/lmm_classifier/v1.py # inference/core/workflows/core_steps/models/foundation/moondream2/v1.py # inference/core/workflows/core_steps/models/foundation/ocr/v1.py # inference/core/workflows/core_steps/models/foundation/openai/v1.py # inference/core/workflows/core_steps/models/foundation/openai/v2.py # inference/core/workflows/core_steps/models/foundation/openai/v3.py # inference/core/workflows/core_steps/models/foundation/openai/v4.py # inference/core/workflows/core_steps/models/foundation/perception_encoder/v1.py # inference/core/workflows/core_steps/models/foundation/qwen/v1.py # inference/core/workflows/core_steps/models/foundation/qwen3_5vl/v1.py # inference/core/workflows/core_steps/models/foundation/qwen3vl/v1.py # inference/core/workflows/core_steps/models/foundation/seg_preview/v1.py # inference/core/workflows/core_steps/models/foundation/segment_anything2/v1.py # inference/core/workflows/core_steps/models/foundation/segment_anything3/v1.py # inference/core/workflows/core_steps/models/foundation/segment_anything3/v2.py # inference/core/workflows/core_steps/models/foundation/segment_anything3/v3.py # inference/core/workflows/core_steps/models/foundation/segment_anything3_3d/v1.py # inference/core/workflows/core_steps/models/foundation/smolvlm/v1.py # inference/core/workflows/core_steps/models/foundation/stability_ai/image_gen/v1.py # inference/core/workflows/core_steps/models/foundation/stability_ai/inpainting/v1.py # inference/core/workflows/core_steps/models/foundation/stability_ai/outpainting/v1.py # inference/core/workflows/core_steps/models/foundation/yolo_world/v1.py # inference/core/workflows/core_steps/models/roboflow/instance_segmentation/v1.py # inference/core/workflows/core_steps/models/roboflow/instance_segmentation/v2.py # inference/core/workflows/core_steps/models/roboflow/keypoint_detection/v1.py # inference/core/workflows/core_steps/models/roboflow/keypoint_detection/v2.py # inference/core/workflows/core_steps/models/roboflow/multi_class_classification/v1.py # inference/core/workflows/core_steps/models/roboflow/multi_class_classification/v2.py # inference/core/workflows/core_steps/models/roboflow/multi_label_classification/v1.py # inference/core/workflows/core_steps/models/roboflow/multi_label_classification/v2.py # inference/core/workflows/core_steps/models/roboflow/object_detection/v1.py # inference/core/workflows/core_steps/models/roboflow/object_detection/v2.py # inference/core/workflows/core_steps/models/roboflow/semantic_segmentation/v1.py # inference/core/workflows/core_steps/sinks/roboflow/custom_metadata/v1.py # inference/core/workflows/core_steps/sinks/roboflow/dataset_upload/v1.py # inference/core/workflows/core_steps/sinks/roboflow/dataset_upload/v2.py # inference/core/workflows/core_steps/sinks/roboflow/model_monitoring_inference_aggregator/v1.py # inference/core/workflows/core_steps/sinks/slack/notification/v1.py # inference/core/workflows/core_steps/sinks/twilio/sms/v1.py # inference/core/workflows/core_steps/sinks/twilio/sms/v2.py # inference/core/workflows/core_steps/sinks/webhook/v1.py # tests/unit/core/cache/test_air_gapped.py # tests/unit/core/interfaces/http/test_blocks_describe_airgapped.py # tests/unit/core/workflows/test_air_gapped_blocks.py
sberan
commented
Mar 31, 2026
| ) | ||
| from inference.core.workflows.prototypes.block import BlockAirGappedInfo | ||
|
|
||
| logger = logging.getLogger(__name__) |
Collaborator
There was a problem hiding this comment.
I am not sure, probably not
| ) | ||
| self._model: InstanceSegmentationModel = AutoModel.from_pretrained( | ||
| model_id_or_path=model_id, | ||
| model_id_or_path=_resolve_cached_model_path(model_id), |
Collaborator
There was a problem hiding this comment.
why this is needed?
wouldn't just work to extend auto-loading cache expiry? It loads for me even in detached mode and you will most likely hit ALLOW_INFERENCE_MODELS_DIRECTLY_ACCESS_LOCAL_PACKAGES guard in standard setup
Contributor
Author
There was a problem hiding this comment.
Rather than manually loading from cache, we can extend auto loader TTL, and we will need to add a flag to throw an error rather than loading from API when files are missing or integrity is not verified. This will be a trivial change to AutoLoader.
Collaborator
|
@sberan could we quickly sync so that I understand what you try to achieve |
The inference-models cache layout uses opaque slug directory names, so deriving model_id from the path produces IDs like microsoft-coco-obj-det/22 instead of coco/22. This breaks the reverse alias lookup in /build/api/models, causing empty aliases and making models invisible in the air-gapped workflow builder dropdown. Now scan_cached_models also reads model_config.json (written by dump_model_config_for_offline_use) and uses its stored model_id, which matches REGISTERED_ALIASES and enables correct alias resolution.
AutoModel.from_pretrained already handles resolving model IDs to local cache paths — duplicating that logic in the adapters layer is unnecessary.
- Add find_cached_model_package_dir() in inference_models as single source of truth for locating cached model packages - Catch RetryError in from_pretrained to fall back to local cache when network is unavailable - Simplify roboflow.py and air_gapped.py to use the shared helper instead of duplicating slug/scan logic - Remove _slugify_model_id from air_gapped.py (was a copy of the canonical function in inference_models) - Add tests for find_cached_model_package_dir, is_model_cached delegation, and the RetryError offline fallback
When the builder UI is rendered externally (outside of the inference server's built-in HTML page), there's no way to obtain the CSRF token needed for builder API calls. This endpoint lets external UIs fetch the token directly.
Some block manifest classes are plain Pydantic models that don't implement get_compatible_task_types, get_air_gapped_availability, or get_supported_model_variants. Add hasattr guards so /workflows/blocks/describe?air_gapped=true doesn't crash.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_resolve_cached_model_pathto inference-models adapters soAutoModel.from_pretrainedcan load directly from the local inference-models cache without calling the Roboflow APImodel_config.jsonin the inference-models cache layout when resolving model metadata, so air-gapped environments can discover models cached by inference-modelsmodel_idintomodel_config.jsonduring model download so offline scanning can map cache entries back to their canonical model IDsloggerinitialization in workflow handlersTest plan
model_config.jsonfallback returns correct task type and architecturemodel_type.jsoncache layout still works as before