Skip to content

OTel: fix problematic Timeout class usages#10789

Open
yhabteab wants to merge 2 commits intomasterfrom
otel-fixes
Open

OTel: fix problematic Timeout class usages#10789
yhabteab wants to merge 2 commits intomasterfrom
otel-fixes

Conversation

@yhabteab
Copy link
Copy Markdown
Member

@yhabteab yhabteab commented Apr 13, 2026

According to the Asio docs 1, Strand::running_in_this_thread() returns true only if the current thread is executing a handler that was posted (boost::asio::post()), dispatched (boost::asio::dispatch()), or deferred (boost::asio::defer()) through the strand. In all other cases, it returns false causing to crash Icinga 2 when trying to use the Timeout class in OTel after a coroutine is resumed that was waiting on a timer via async_wait(). Interestingly, the crash only happens when building Icinga 2 in release mode, in debug mode it works just fine. So, it seems that the behavior of Strand::running_in_this_thread() is different in release and debug modes, which could be due to compiler or Asio's own optimizations.

The fix is simple: remove the assertion completely because it is not necessary anyway as it's not a requirement for the Timeout class to work correctly. Even if strand::running_in_this_thread() returns false, it doesn't necessarily mean that the strand is not being used correctly, so the assertion is not a reliable way to check for correct usage of the strand.

To verify the bug, you can apply the following patch to the OTel class:

diff --git a/lib/otel/otel.cpp b/lib/otel/otel.cpp
index 42be64f3a..1ed39d589 100644
--- a/lib/otel/otel.cpp
+++ b/lib/otel/otel.cpp
@@ -282,6 +282,7 @@ void OTel::Connect(boost::asio::yield_context& yc)
 				boost::system::error_code ec;
 				m_RetryExportAndConnTimer.expires_after(Backoff(attempt));
 				m_RetryExportAndConnTimer.async_wait(yc[ec]);
+				VERIFY(m_Strand.running_in_this_thread());
 			}
 		}
 	}
@@ -314,6 +315,7 @@ void OTel::ExportLoop(boost::asio::yield_context& yc)
 		// avoid waiting indefinitely in that case.
 		while (!m_Request && !m_Stopped) {
 			m_ExportAsioCV.Wait(yc);
+			VERIFY(m_Strand.running_in_this_thread());
 		}

 		if (m_Stopped) {
Crush Dumps
[2026-04-13 09:31:27 +0200] information/OTelExporter: Connecting to OpenTelemetry backend on host 'localhost:8428'.
[2026-04-13 09:31:27 +0200] critical/OTelExporter: Cannot connect to OpenTelemetry backend 'localhost:8428' (attempt #1): Connection refused [system:61 at /opt/homebrew/include/boost/asio/detail/reactive_socket_connect_op.hpp:97:37 in function 'do_complete']
/Users/yhabteab/Workspace/icinga2/lib/base/io-engine.hpp:240: assertion failed: strand.running_in_this_thread()
Caught SIGABRT.
Current time: 2026-04-13 09:31:27 +0200

[2026-04-13 09:31:27 +0200] critical/Application: Icinga 2 has terminated unexpectedly. Additional information can be found in '/Users/yhabteab/Workspace/icinga2/prefix/var/log/icinga2/crash/report.1776065487.747801'
[2026-04-13 09:31:27 +0200] notice/cli: Seamless worker (PID 14160) stopped, stopping as well
Stacktrace:
 0# icinga::Application::SigAbrtHandler(int) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 1# _sigtramp in /usr/lib/system/libsystem_platform.dylib
 2# pthread_kill in /usr/lib/system/libsystem_pthread.dylib
 3# abort in /usr/lib/system/libsystem_c.dylib
 4# icinga::IoEngine::Get() in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 5# icinga::OTel::Connect(boost::asio::basic_yield_context<boost::asio::executor>&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 6# icinga::OTel::ExportLoop(boost::asio::basic_yield_context<boost::asio::executor>&) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2
 7# void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, boost::context::basic_fixedsize_stack<boost::context::stack_traits>, boost::asio::detail::spawned_fiber_thread::entry_point<boost::asio::detail::spawn_entry_point<boost::asio::io_context::strand, void icinga::IoEngine::SpawnCoroutine<boost::asio::io_context::strand, icinga::OTel::Start()::$_0>(boost::asio::io_context::strand&, icinga::OTel::Start()::$_0)::'lambda'(boost::asio::basic_yield_context<boost::asio::executor>), boost::asio::detail::detached_handler>>>>(boost::context::detail::transfer_t) in /Users/yhabteab/Workspace/icinga2/prefix/lib/icinga2/sbin/icinga2

fixes #10783

Footnotes

  1. https://www.boost.org/doc/libs/1_81_0/doc/html/boost_asio/reference/strand/running_in_this_thread.html

According to the Asio docs [^1], `Strand::running_in_this_thread()`
returns `true` only if the current thread is executing a handler that
was posted (boost::asio::post()), dispatched (boost::asio::dispatch()),
or deferred (boost::asio::defer()) through the strand. In all other
cases, it returns `false` causing to crash Icinga 2 when trying to use
the `Timeout` class in `OTel` after a coroutine is resumed that was
waiting on a timer via `async_wait()`. Interestingly, the crash only
happens when building Icinga 2 in release mode, in debug mode it works
just fine. So, it seems that the behavior of
`Strand::running_in_this_thread()` is different in release and debug
modes, which could be due to compiler or Asio's own optimizations.

[^1]: https://www.boost.org/doc/libs/1_81_0/doc/html/boost_asio/reference/strand/running_in_this_thread.html
@yhabteab yhabteab added this to the 2.16.0 milestone Apr 13, 2026
@yhabteab yhabteab added bug Something isn't working area/opentelemetry Metrics to OpenTelemetry. labels Apr 13, 2026
@cla-bot cla-bot bot added the cla/signed label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/opentelemetry Metrics to OpenTelemetry. bug Something isn't working cla/signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTLPMetricsWriter VictoriaMetrics - Error: Broken pipe

1 participant