Skip to content

Guard save_compiled_model with try-catch to handle serialization failures#227

Open
aditya-dl wants to merge 1 commit intoROCm:wml-mainfrom
aditya-dl:fix-save-compiled-model-exception
Open

Guard save_compiled_model with try-catch to handle serialization failures#227
aditya-dl wants to merge 1 commit intoROCm:wml-mainfrom
aditya-dl:fix-save-compiled-model-exception

Conversation

@aditya-dl
Copy link
Copy Markdown

Description

Wrap only the save_compiled_model call (in both the initial compile and recompile paths) with a try-catch that logs a warning instead of propagating the exception. All other operations (SerializeToString, parse_onnx_buffer, compile_program) are confirmed to succeed and are left unwrapped.

Motivation and Context

save_compiled_model can throw std::bad_alloc during MXR cache serialization when the compiled model is large. This crashes the entire inference session even though the model compiled and runs correctly; the cache file is purely an optimization for subsequent loads. Inference should not fail because of a cache write failure.

@TedThemistokleous TedThemistokleous self-requested a review March 17, 2026 02:26
@TedThemistokleous
Copy link
Copy Markdown
Collaborator

TedThemistokleous commented Mar 17, 2026

@aditya-dl appreciate the patch to this. Curious where are you seeing this blow up?

How large of a file? What limits are you seeing this fail on and is that anywhere we can easily reproduce?

save_compiled_model(prog, model_cache_file);
} catch (const std::exception& e) {
LOGS_DEFAULT(WARNING) << ">>>>>> exception caught at Compile::save_compiled_model: " << e.what();
} catch (...) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do a catch all here as we want to be explicit in what exceptions we're expecting and those that we aren't. If there's an MIGraphX API fail we're hitting since that's what save_compiled_model uses, I don't think we should catch and mask that but instead report it.

Copy link
Copy Markdown
Collaborator

@TedThemistokleous TedThemistokleous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions/comments on this one. Let me know what error and things you're seeing so we can be explicit in catching certain exceptions. We want to be able to signal API failures if something unexpected happens.

This is a good fix is good to stop a higher level application from crashing but I want to be clear on what cases we're seeing this and dig further if you're hitting and MIGraphX failure and there's a bug somewhere else right now.

@TedThemistokleous TedThemistokleous added the Bugfix Fix to a bug or reported issue label Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bugfix Fix to a bug or reported issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants