Skip to content

feat(pluginDaemon): expose shareProcessNamespace on pod spec#402

Merged
BorisPolonsky merged 1 commit intoBorisPolonsky:masterfrom
wangchongyu:feat-plugin-daemon-shareprocessnamespace
Apr 20, 2026
Merged

feat(pluginDaemon): expose shareProcessNamespace on pod spec#402
BorisPolonsky merged 1 commit intoBorisPolonsky:masterfrom
wangchongyu:feat-plugin-daemon-shareprocessnamespace

Conversation

@wangchongyu
Copy link
Copy Markdown
Contributor

@wangchongyu wangchongyu commented Apr 17, 2026

Summary

Add pluginDaemon.shareProcessNamespace values key (default false) and interpolate it into the plugin-daemon Deployment's pod spec. This lets operators work around a design flaw in the embedded dify_plugin Python SDK that makes plugins self-terminate on Kubernetes.

Why

Every installed plugin ships its own copy of dify_plugin in its .venv. The SDK's core/server/io_server.py::_parent_alive_check runs as a background thread and calls os._exit(-1) whenever os.getppid() == 1:

def _parent_alive_check(self):
    while True:
        time.sleep(0.5)
        parent_process_id = os.getppid()
        if parent_process_id == 1:
            os._exit(-1)

This is a naive orphan-detection that assumes the container has an init process as PID 1 — which is true for Docker Compose (injects tini via docker run --init / init: true) but not for Kubernetes by default. On K8s, the plugin-daemon container runs /app/main as PID 1, so every Python plugin subprocess it spawns has getppid() == 1 and self-exits ~500 ms after launch. The Go daemon's scheduleLoop respawns, and the cycle repeats.

Observable symptom: ValidateProviderCredentials and other SSE-dispatch calls fail with PluginDaemonInternalServerError: no proper instance. Inside the pod, ps -eo pid,ppid,etime shows Python subprocesses cycling every ~3–4 s with PPID=1. This is reproducible on plain GKE with dify/dify 0.36.0 + dify-plugin-daemon:0.5.8-local.

Fix

Set shareProcessNamespace: true on the pod. Kubernetes then puts the pause container at PID 1 and the daemon at PID ≥ 2, so plugin subprocesses inherit PPID ≥ 2 and the SDK's check stops firing.

This PR:

  • adds pluginDaemon.shareProcessNamespace to charts/dify/values.yaml (default false — no change for existing users)
  • adds a conditional block in charts/dify/templates/plugin-daemon-deployment.yaml that emits shareProcessNamespace: true when the flag is enabled

Test plan

  • helm template with --set pluginDaemon.shareProcessNamespace=true emits shareProcessNamespace: true at the correct pod-spec level
  • helm template with defaults emits no shareProcessNamespace field (backward compatible)
  • Applied equivalent manual kubectl patch on a live GKE cluster — OpenAI plugin subprocess stays alive; ValidateProviderCredentials succeeds end-to-end
  • Reviewer can verify by running helm template with and without the flag

Notes

  • Consequences of enabling: plugin subprocesses become visible across containers in the pod (only relevant if sidecars are added); zombie reaping shifts to the pause container (net improvement). No impact for single-container plugin-daemon pods.
  • Alternative approaches rejected: rebuilding the daemon image with tini as ENTRYPOINT (forks the upstream image); wrapping command with sh -c (fragile signal/zombie handling); patching the SDK in place (has to be reapplied per-plugin-install).
  • Happy to adjust scope or naming — e.g. putting this under a generic pluginDaemon.podSpec override hook — if the project prefers that direction.

The embedded dify_plugin Python SDK (in every plugin's .venv) has an
orphan-detection thread that calls os._exit(-1) when os.getppid()==1.
This assumes Docker --init (tini) is injecting PID 1, which Docker
Compose does by default. On Kubernetes the plugin-daemon container
runs /app/main as PID 1, so every Python plugin subprocess self-exits
~500ms after launch and dispatch fails with "no proper instance".

Setting shareProcessNamespace: true on the pod makes the K8s pause
container PID 1, the daemon moves to PID >=2, and plugin subprocesses
have PPID >1 so the SDK check passes.

Defaults to false to preserve existing behaviour; users on K8s should
set pluginDaemon.shareProcessNamespace=true.
@BorisPolonsky BorisPolonsky merged commit aa17b2c into BorisPolonsky:master Apr 20, 2026
1 check passed
@BorisPolonsky BorisPolonsky added the enhancement New feature or request label Apr 20, 2026
BorisPolonsky added a commit that referenced this pull request Apr 20, 2026
Add a values.yaml note linking the pluginDaemon.shareProcessNamespace flag
to the related pull request for operators who need the workaround.
BorisPolonsky added a commit that referenced this pull request Apr 20, 2026
)

* fix(schema): add `pluginDaemon.shareProcessNamespace` to `values.schema.json`

* feat(helm): configurable shareProcessNamespace across deployments

* docs(dify): comment on shareProcessNamespace workaround for PR #402

Add a values.yaml note linking the pluginDaemon.shareProcessNamespace flag
to the related pull request for operators who need the workaround.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants