Skip to content

fix(aws-vault-proxy): add a cli flag to check the running service#307

Open
gdamjan wants to merge 1 commit intoByteNess:mainfrom
gdamjan:vault-proxy-check-running
Open

fix(aws-vault-proxy): add a cli flag to check the running service#307
gdamjan wants to merge 1 commit intoByteNess:mainfrom
gdamjan:vault-proxy-check-running

Conversation

@gdamjan
Copy link

@gdamjan gdamjan commented Feb 19, 2026

now that we build the aws-vault-proxy docker image from scratch, the image doesn't contain pgrep, or curl or any other executables.

implement a healthcheck command in the same executable, that sends a http request to itself on localhost, and assume healthy for any http response. if the proxy process is stuck, then it'll become unhealthy.

can be tested by running:

docker compose kill --signal STOP aws-vault-proxy

which will make the container "stuck". docker will soon enough (after the default timeouts and retries) mark the service/container unhealthy.

Unstuck it with:

docker compose kill --signal CONT aws-vault-proxy

@gdamjan gdamjan requested a review from mbevc1 as a code owner February 19, 2026 19:10
@github-actions github-actions bot added the fix label Feb 19, 2026
@gdamjan
Copy link
Author

gdamjan commented Feb 19, 2026

I can't see where the "Validate PR" fails now?

@mbevc1 mbevc1 changed the title fix: aws-vault-proxy: add a cli flag to check the running service fix(aws-vault-proxy): add a cli flag to check the running service Feb 19, 2026
@mbevc1
Copy link
Contributor

mbevc1 commented Feb 19, 2026

I think double colon might be confusing the check, because strictly features go in brackets 🤔

@gdamjan
Copy link
Author

gdamjan commented Feb 19, 2026

I think double colon might be confusing the check, because strictly features go in brackets 🤔

ahh, thanks

Comment on lines +33 to +41
req, err := http.NewRequest("HEAD", "http://127.0.0.1:80/", nil)
if err != nil {
os.Exit(1)
}
_, err = http.DefaultClient.Do(req)
if err != nil {
os.Exit(1)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I suggest an improvemnt here and add timeout without building a body:

Suggested change
req, err := http.NewRequest("HEAD", "http://127.0.0.1:80/", nil)
if err != nil {
os.Exit(1)
}
_, err = http.DefaultClient.Do(req)
if err != nil {
os.Exit(1)
}
}
client := &http.Client{
Timeout: 2 * time.Second,
}
resp, err := client.Get("http://127.0.0.1:8080/health")
if err != nil {
os.Exit(1)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
os.Exit(1)
}
os.Exit(0)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'll need to be port 80 though, otherwise off-course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course - my typo here!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some concerns,

  • the timeout is actually handled by the docker runtime itself, so if the test command doesn't exit in time, it'll consider it unhealthy and stop it.
    • setting up 2 seconds might not be what docker users would want?
  • signaling unhealthiness on resp.StatusCode != http.StatusOK is not great, since the proxy can be alive and well but the GET /health would return 502 or 404 depending what's outside the proxy. We don't need to ultimately restart the proxy in that case IMHO

Copy link
Contributor

@mbevc1 mbevc1 Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I was actually thinking about that and I think the question is - is the proxy useful if noting is responding? Do we need a health check if this check is irrelevant or we don't want to restart.

On the 2s, that was just an example and happy to change whatever feel a better fit here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦
I've pushed to a wrong branch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you review this now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gdamjan . Looks look, the only thing I wonder if we should still add timeout on the connection. I don't have strong opinion on what it should be, but I'd be interested in your thoughts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

now that we build the aws-vault-proxy docker image from **scratch**,
the image doesn't contain pgrep, or curl or any other executables.

implement a healthcheck command in the same executable, that sends a http
request to itself on localhost, and assume healthy for any http response.
if the proxy process is stuck, then it'll become unhealthy.

can be tested by running:
```
docker compose kill --signal STOP aws-vault-proxy
```
which will make the container "stuck". docker will soon enough (after
the default timeouts and retries) mark the service/container unhealthy.

Unstuck it with:
```
docker compose kill --signal CONT aws-vault-proxy
```
@gdamjan gdamjan force-pushed the vault-proxy-check-running branch from 9d976ef to cba4ca5 Compare March 9, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants