Add memory tracking telemetry support for Tracy#22
Conversation
- Controllable via ENABLE_TELEMETRY_MEMORY_TRACKING, - Set to same value as CCP_TELEMETRY_ENABLED
Only rely on Tracy started/connected when sending memory free events
This unlocks the ability to do proper inspection of our interactions with Tracy. Nota bene: this class was created by Claude Code, and for tracy 0.11.1, so it may need some more TLC before it is fully usable. It might also be a breaking point when updating to newer tracy versions since it relies on Tracy's internal protocol.
So that tests can use it. Also provides a `TickTelemetry` wrapper around `CcpTickTelemetry` to deal with the synchronization between the test thread and the TracyTestClient thread.
Because that is how tracy works underneath the hood as well. Change made by claude code.
Interesting one! `SimpleZoneTest` crashes when entering the zone and accessing `t_taskletZoneStore`. This happens because the default value for `t_activeFiber` points at the first element in the `fiberNameStore`. Now, for whatever reason, when starting the telemetry integration set the unnamed root fiber name to trigger initialization of the corresponding zone stack, this does return the same iterator as `t_activeFiber` already contains. As a consequence, the initialization performs an early out, and does not create a relevant TaskletZoneStore. And without such a store, entering a zone clearly accesses invalid memory, causing the segmentation fault.
The internal book-keeping allows for multiple tracy zones sharing the same key (e.g. the `TaskletZoneStore` is a `stack`). However, due to the on-demand profiling book-keeping, there is an additional `set` that tracks whether a zone was manually created with a given key. Since that key is not required to be unique, it may only be removed from the manually tracked set once there are no more zones on that zone store's stack.
…nto tracy_memory_tracking
Our TracyTestClient needs access to (some of) the source files of Tracy in order to build correctly.
Likely still does not work on Windows, but at least it's not because of relying on Tracy internals.
Untested on Windows, written by claude on macOS.
…nto tracy_memory_tracking # Conflicts: # tests/CMakeLists.txt
…ations. CcpTelemetryIsConnected() has the check for: - TracyIsStarted - TracyIsConnected - checks s_profilerState == Started This means that any multiple start/stops within the same profiling session (in Tracy GUI visualizer) may fail. That's because it may try to allocate memory to the same memory address as previously done, but without registering the free event for that memory address. Because it got the free event while in a Tracy stopped state.
- Additionally sprinkle the code with fprintf statements, for debug purposes. To be removed later.
- This sets the m_shutdown boolean which controls the RecvLoop. - Also sprinkle in some debug fprintf statements (to be removed later)
- Call RequestShutdown() when StopRequested - Call ShutdownProfiler() when Stopped and safe to do so - Cleanup fiber names store and erease map, along with thread local zones stores - Make sure not to emit data for zones if telemetry is not connected. - Change indent of temp debug fprintf statements (all debug info to be removed later)
Added skeleton code for the following tests: - StartConnectStopDisconnectProfiling - StartConnectStopProfiling - StartConnectDisconnectProfiling - StartConnectDisconnectStopProfiling - ReconnectAfterStop
This maps onto TRACY_NO_CRASH_HANDLER, which should disable the internal Tracy crash handler. The no-crash-handler feature is specific to the carbonengine/vcpkg-registry port of Tracy. It replaces the default crash-handler INVERTED FEATURE in the official vcpkg registry.
Previously, the socket was only closed when `TracyTestClient::Disconnect` was called. However, it also needs to be closed when the profiler itself signals a disconnect.
# Conflicts: # CMakeLists.txt
Changes include: - Remove explicit call to tracy::ShutdownProfiler() - Revert Zone state bookkeeping changes/cleanup associated with earlier ShutdownProfiler change
…e into tracy_memory_tracking
Not sure why the tests weren't able to correctly deduce that it's a string literal without `constexpr`. The occurrence in `Throw` was a valid concern, as that just blindly trusts the input, which the compiler cannot assume as always being a string literal.
This keeps usage of the underlying profiling framework isolated.
Also: - Cleaned up unused/wrong tests - Added overload for SetUp() that keeps existing connection to TracyTestClient - Removed TODO statements
| SetUp(true); | ||
| } | ||
|
|
||
| void SetUp(bool doTestClientConnect) |
There was a problem hiding this comment.
I'm not entirely sure what the boolean parameter here is for. If there is desire to have different set up behaviour, this should be done through a specialised Test class instead.
| auto tracyZones = m_tracyClient.GetZones(); | ||
| // CcpTelemetryEnterZone passes the zone name as the Tracy "function" field | ||
| // (via the 6-param ___tracy_alloc_srcloc), so match against both fields. | ||
| auto pred = [&zoneName]( const TracyTestClient::ZoneInfo& elem ) -> bool { | ||
| return elem.function == zoneName; | ||
| }; | ||
| EXPECT_NE( tracyZones.end(), std::find_if( tracyZones.begin(), tracyZones.end(), pred ) ); |
There was a problem hiding this comment.
Seeing that this snippet is now used in multiple places, it would benefit from being refactored into a helper function that does the assertion.
| tracyZones = m_tracyClient.GetZones(); | ||
| EXPECT_EQ( tracyZones.end(), std::find_if( tracyZones.begin(), tracyZones.end(), pred ) ); |
There was a problem hiding this comment.
This is a good candidate for a helper function, too, since it's the inverse of the snippet right above.
| auto tracyZones = m_tracyClient.GetZones(); | ||
| auto pred = [&zoneName1]( const TracyTestClient::ZoneInfo& elem ) -> bool { | ||
| return elem.function == zoneName1; | ||
| }; | ||
| EXPECT_NE( tracyZones.end(), std::find_if( tracyZones.begin(), tracyZones.end(), pred ) ); |
There was a problem hiding this comment.
This shouldn't have been copy/pasted. Also, why is this necessary to check this as part of the restart-after-stop scenario?
| EXPECT_TRUE( m_tracyClient.GetZones().empty() ); | ||
| } | ||
|
|
||
| TEST_F( CcpTelemetryTest, ReStartAfterStop ) |
There was a problem hiding this comment.
Which specific restart-after-stop scenario is simulated here? The telemetry integration, or the client application?
There was a problem hiding this comment.
This is simulating:
- Press Start in ESP/Client
- Press Stop in ESP/Client
- Press Start again in ESP/Client
| CcpStopTelemetry(); | ||
| TickTelemetry(); // This processes the "StopRequested" state. | ||
| TickTelemetry(); // This processes the "Stopped" state. | ||
| EXPECT_TRUE( m_tracyClient.IsConnected() ) << "Connection should still be true at this point because the TracyTestClient hasn't been disconnected"; |
There was a problem hiding this comment.
Is this important to test here? Also, is this is a valid assumption? m_tracyClient.IsConnected() returns false once the TracyTestClient notices that the socket connection went away.
There was a problem hiding this comment.
m_connected only becomes false on TracyTestClient::Disconnect() and at the end of the RevcLoop() after sock.Close().
This test is not disconnecting from the TracyTestClient, to this will always be true.
But you're right in this test, we don't need to check if m_tracyClient.IsConnected() == true (because it will always be true as there is no disconnect happening)
| TickTelemetry(); // This processes the "StopRequested" state. | ||
| TickTelemetry(); // This processes the "Stopped" state. | ||
| EXPECT_TRUE( m_tracyClient.IsConnected() ) << "Connection should still be true at this point because the TracyTestClient hasn't been disconnected"; | ||
| EXPECT_FALSE( CcpTelemetryIsStarted() ) << "Internal profiler state should have changed: Started->StopRequested->Stopped"; |
There was a problem hiding this comment.
The assertion as written also holds if we accidentally end up in StartRequested. And it should be unnecessary in a world where we have a helper method for the tests to perform a stop of the telemetry integration.
| EXPECT_FALSE( CcpTelemetryIsStarted() ) << "Internal profiler state should have changed: Started->StopRequested->Stopped"; | ||
|
|
||
| // Simulate a new call to StartTelemetry | ||
| SetUp( false ); |
There was a problem hiding this comment.
As pointed out in my comments on the SetUp( bool ) overload, this really ought to call an explicit helper method.
| // Simulate a new call to StartTelemetry | ||
| SetUp( false ); | ||
| EXPECT_TRUE( m_tracyClient.IsConnected() ) << "Connection should still be true because the TracyTestClient hasn't never been disconnected"; | ||
| EXPECT_TRUE( CcpTelemetryIsStarted() ) << "Internal profiler state should have changed: Started->StopRequested->Stopped"; |
There was a problem hiding this comment.
The message seems to be copied from above, and it's missing the StartRequested phase in the flow. Also, it's an assertion that's implied by calling an appropriate helper method to started the telemetry integration.
| auto tracyZones2ndStart = m_tracyClient.GetZones(); | ||
| auto pred2nd = [&zoneName2]( const TracyTestClient::ZoneInfo& elem ) -> bool { | ||
| return elem.function == zoneName2; | ||
| }; | ||
| EXPECT_NE( tracyZones2ndStart.end(), std::find_if( tracyZones2ndStart.begin(), tracyZones2ndStart.end(), pred2nd ) ); |
There was a problem hiding this comment.
As mentioned above, this should be a helper method instead of copy/paste.
Changes include: