Guard save_compiled_model with try-catch to handle serialization failures by aditya-dl · Pull Request #227 · ROCm/onnxruntime

aditya-dl · 2026-03-17T01:07:13Z

Description

Wrap only the save_compiled_model call (in both the initial compile and recompile paths) with a try-catch that logs a warning instead of propagating the exception. All other operations (SerializeToString, parse_onnx_buffer, compile_program) are confirmed to succeed and are left unwrapped.

Motivation and Context

save_compiled_model can throw std::bad_alloc during MXR cache serialization when the compiled model is large. This crashes the entire inference session even though the model compiled and runs correctly; the cache file is purely an optimization for subsequent loads. Inference should not fail because of a cache write failure.

…ures

TedThemistokleous · 2026-03-17T02:28:01Z

@aditya-dl appreciate the patch to this. Curious where are you seeing this blow up?

How large of a file? What limits are you seeing this fail on and is that anywhere we can easily reproduce?

TedThemistokleous · 2026-03-17T02:32:58Z

onnxruntime/core/providers/migraphx/migraphx_execution_provider.cc

+          save_compiled_model(prog, model_cache_file);
+        } catch (const std::exception& e) {
+          LOGS_DEFAULT(WARNING) << ">>>>>> exception caught at Compile::save_compiled_model: " << e.what();
+        } catch (...) {


Don't do a catch all here as we want to be explicit in what exceptions we're expecting and those that we aren't. If there's an MIGraphX API fail we're hitting since that's what save_compiled_model uses, I don't think we should catch and mask that but instead report it.

TedThemistokleous

Few questions/comments on this one. Let me know what error and things you're seeing so we can be explicit in catching certain exceptions. We want to be able to signal API failures if something unexpected happens.

This is a good fix is good to stop a higher level application from crashing but I want to be clear on what cases we're seeing this and dig further if you're hitting and MIGraphX failure and there's a bug somewhere else right now.

Guard save_compiled_model with try-catch to handle serialization fail…

edc130e

…ures

TedThemistokleous self-requested a review March 17, 2026 02:26

TedThemistokleous reviewed Mar 17, 2026

View reviewed changes

TedThemistokleous requested changes Mar 17, 2026

View reviewed changes

TedThemistokleous assigned aditya-dl Mar 17, 2026

TedThemistokleous added the Bugfix Fix to a bug or reported issue label Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard save_compiled_model with try-catch to handle serialization failures#227

Guard save_compiled_model with try-catch to handle serialization failures#227
aditya-dl wants to merge 1 commit intoROCm:wml-mainfrom
aditya-dl:fix-save-compiled-model-exception

aditya-dl commented Mar 17, 2026

Uh oh!

TedThemistokleous commented Mar 17, 2026 •

edited

Loading

Uh oh!

TedThemistokleous Mar 17, 2026

Uh oh!

TedThemistokleous left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aditya-dl commented Mar 17, 2026

Description

Motivation and Context

Uh oh!

TedThemistokleous commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TedThemistokleous Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

TedThemistokleous left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TedThemistokleous commented Mar 17, 2026 •

edited

Loading