[DO NOT MERGE] chore: stderr heartbeat + thread dump for deadlock investigation#1027
Draft
Anatolii Yatsuk (tolik0) wants to merge 1 commit into
Draft
[DO NOT MERGE] chore: stderr heartbeat + thread dump for deadlock investigation#1027Anatolii Yatsuk (tolik0) wants to merge 1 commit into
Anatolii Yatsuk (tolik0) wants to merge 1 commit into
Conversation
Adds a daemon thread that writes periodic status to stderr every 30s (message counts, bytes written, queue size/full state) and dumps all thread stack traces via sys._current_frames() when the source has been silent for 90+ seconds. This is investigation tooling for diagnosing connector stalls and heartbeat timeouts. Not intended for merge — provides the evidence needed to classify stalls against known patterns (full-queue self- deadlock, socket read hang, concurrent-generator starvation). Mirrors the diagnostic portion of #953, with the ConcurrentMessageRepository deadlock fix omitted (already on main via #977). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@tolik0/cdk/stderr-heartbeat-thread-dump#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch tolik0/cdk/stderr-heartbeat-thread-dumpPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
PyTest Results (Fast)593 tests ±0 582 ✅ ±0 3m 35s ⏱️ +4s For more details on these failures, see this check. Results for commit b6e1402. ± Comparison against base commit 19a7083. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch is investigation tooling for diagnosing connector stalls and heartbeat timeouts. It runs a daemon thread, so it should not be merged into
main— engineers should cherry-pick locally when an investigation needs evidence.What it adds
STDOUT_HEARTBEAT: t=... msgs=... bytes=... print_blocked=... queue_size=... queue_full=...directly to fd 2, so Kubernetes captures it independently of stdout.Why
The thread dump produces the decisive evidence for classifying stalls against the known patterns:
Without this, classifications rely on indirect symptoms; with it, the stack frame names the mechanism directly.
Origin
Mirrors the diagnostic portion of #953. The `ConcurrentMessageRepository` deadlock-prevention diff is omitted because it has already landed in `main` via #977.
How to use locally
```bash
In a connector workspace, pin this CDK ref:
poetry add 'git+https://github.com/airbytehq/airbyte-python-cdk.git@tolik0/cdk/stderr-heartbeat-thread-dump'
Then run the connector normally — heartbeat lines and thread dumps land in stderr.
```