Skip to content
This repository was archived by the owner on May 19, 2026. It is now read-only.

Add staging proxy back + Switch ghfe/scheduler scaling to CPU usage; halve resource limits#24

Merged
luhenry merged 4 commits into
mainfrom
go
May 15, 2026
Merged

Add staging proxy back + Switch ghfe/scheduler scaling to CPU usage; halve resource limits#24
luhenry merged 4 commits into
mainfrom
go

Conversation

@luhenry

@luhenry luhenry commented May 15, 2026

Copy link
Copy Markdown
Contributor

No description provided.

luhenry added 2 commits May 15, 2026 21:52
Restore the staging-proxy behavior the Python ghfe had: when the prod
instance receives a workflow_job for an entity/repo listed in
EntityConfig.Staging, forward the unmodified body and every request
header to the staging ghfe (STAGING_URL) and relay its response back to
GitHub. The prod instance records a single proxied_to_staging row and
does not write to the jobs table, so the staging environment can be
exercised end-to-end against a real repository (riscv-runner-sample)
without provisioning a prod runner.

Transport errors return 502 so GitHub redelivers; an audit row is still
written. The proxy client lives on App as StagingProxy so tests can
inject a stub RoundTripper. Three tests pin the contract: happy path
(body + headers + status + response headers relayed, one event row, no
job row), upstream failure → 502, and shouldProxyToStaging's four
negative cases.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
Both containers were scaling on Scaleway's default trigger (concurrent
requests), which is a poor signal for our workload: ghfe handles short
webhook bursts where queue depth doesn't reflect work in flight, and the
scheduler is pinned 1/1 anyway. Switch both to scalingOption.type =
cpu_usage at 70% so replicas grow when there's real CPU pressure, and
halve cpuLimit (500 -> 250 mCPU) and memoryLimit (512 -> 256 MiB) to
match the actual footprint of the Go binaries.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
Comment thread container/cmd/ghfe/webhook.go Outdated
ownerLogin, _ := owner["login"].(string)
installID := asInt64(install["id"])
repoFullName, _ := repo["full_name"].(string)
entity := internal.Entity{Type: internal.EntityType(ownerType), Name: ownerLogin, ID: ownerID}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bypasses the code at https://github.com/riseproject-dev/riscv-runner-app/pull/24/changes#diff-b822dcbad97356546fd951f87c3d46dfdba6afad40d64a302612da3e82486509R217-R219. Reconcile the two, make sure we do checking on the value of ownerType.

Comment thread container/cmd/ghfe/webhook.go Outdated
Comment on lines 184 to 186
@@ -184,6 +186,21 @@ func (a *App) handleWorkflowJobEvent(w http.ResponseWriter, r *http.Request, pay
EntityName: &ownerLogin,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the entity you just built for these three fields

Two PR #24 review threads pointed at the same restructure:

- The early entity construction cast ownerType to internal.EntityType
  directly, silently producing a zero-value Type for any unknown string
  and shipping it onward. The ParseEntityType check downstream never
  ran for the proxy path (and didn't run at all for ignored_action
  rows). Move the validation up next to the ownerID-missing check so
  every workflow_job code path -- proxy, ignored_action, the three
  real actions -- builds entity from a parsed Type or returns 400.
- The base eventRecord still pointed at the loose ownerType/ownerID/
  ownerLogin locals. Switch the three pointer fields to take the
  addresses of entity.Type (via a string local since eventRecord.EntityType
  is *string), entity.ID, and entity.Name.

Slight behavioral side-effect: a workflow_job with a malformed
owner.type (in practice impossible from GitHub) now returns 400 with
no row written, where before an ignored_action row would still have
been recorded for non-{queued,in_progress,completed} actions.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
Comment thread container/cmd/ghfe/webhook.go Outdated
return
}
entity := internal.Entity{Type: et, Name: ownerLogin, ID: ownerID}
entityTypeStr := string(entity.Type)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you inline that at db7a527#diff-b822dcbad97356546fd951f87c3d46dfdba6afad40d64a302612da3e82486509R196? Also, given EntityType is a string type EntityType string, why can't you just pass &entity.Type at line 194?

Per PR #24 review: drop the entityTypeStr local and convert the
pointer at the eventRecord literal directly. `(*string)(&entity.Type)`
works because Go allows explicit conversion between unnamed pointer
types whose base types have identical underlying types -- here
internal.EntityType's underlying is string. The implicit form
(`&entity.Type`) doesn't compile because eventRecord.EntityType is
declared as *string, and Go won't auto-coerce *internal.EntityType to
*string.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
@luhenry luhenry marked this pull request as ready for review May 15, 2026 22:22
@luhenry luhenry merged commit dc26cba into main May 15, 2026
4 checks passed
luhenry added a commit that referenced this pull request May 15, 2026
…halve resource limits (#24)

* Add staging proxy to Go ghfe

Restore the staging-proxy behavior the Python ghfe had: when the prod
instance receives a workflow_job for an entity/repo listed in
EntityConfig.Staging, forward the unmodified body and every request
header to the staging ghfe (STAGING_URL) and relay its response back to
GitHub. The prod instance records a single proxied_to_staging row and
does not write to the jobs table, so the staging environment can be
exercised end-to-end against a real repository (riscv-runner-sample)
without provisioning a prod runner.

Transport errors return 502 so GitHub redelivers; an audit row is still
written. The proxy client lives on App as StagingProxy so tests can
inject a stub RoundTripper. Three tests pin the contract: happy path
(body + headers + status + response headers relayed, one event row, no
job row), upstream failure → 502, and shouldProxyToStaging's four
negative cases.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* Switch ghfe/scheduler scaling to CPU usage; halve resource limits

Both containers were scaling on Scaleway's default trigger (concurrent
requests), which is a poor signal for our workload: ghfe handles short
webhook bursts where queue depth doesn't reflect work in flight, and the
scheduler is pinned 1/1 anyway. Switch both to scalingOption.type =
cpu_usage at 70% so replicas grow when there's real CPU pressure, and
halve cpuLimit (500 -> 250 mCPU) and memoryLimit (512 -> 256 MiB) to
match the actual footprint of the Go binaries.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* ghfe: validate ownerType once, use entity in event record

Two PR #24 review threads pointed at the same restructure:

- The early entity construction cast ownerType to internal.EntityType
  directly, silently producing a zero-value Type for any unknown string
  and shipping it onward. The ParseEntityType check downstream never
  ran for the proxy path (and didn't run at all for ignored_action
  rows). Move the validation up next to the ownerID-missing check so
  every workflow_job code path -- proxy, ignored_action, the three
  real actions -- builds entity from a parsed Type or returns 400.
- The base eventRecord still pointed at the loose ownerType/ownerID/
  ownerLogin locals. Switch the three pointer fields to take the
  addresses of entity.Type (via a string local since eventRecord.EntityType
  is *string), entity.ID, and entity.Name.

Slight behavioral side-effect: a workflow_job with a malformed
owner.type (in practice impossible from GitHub) now returns 400 with
no row written, where before an ignored_action row would still have
been recorded for non-{queued,in_progress,completed} actions.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* ghfe: inline EntityType pointer via explicit conversion

Per PR #24 review: drop the entityTypeStr local and convert the
pointer at the eventRecord literal directly. `(*string)(&entity.Type)`
works because Go allows explicit conversion between unnamed pointer
types whose base types have identical underlying types -- here
internal.EntityType's underlying is string. The implicit form
(`&entity.Type`) doesn't compile because eventRecord.EntityType is
declared as *string, and Go won't auto-coerce *internal.EntityType to
*string.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
luhenry added a commit that referenced this pull request May 15, 2026
…halve resource limits (#24)

* Add staging proxy to Go ghfe

Restore the staging-proxy behavior the Python ghfe had: when the prod
instance receives a workflow_job for an entity/repo listed in
EntityConfig.Staging, forward the unmodified body and every request
header to the staging ghfe (STAGING_URL) and relay its response back to
GitHub. The prod instance records a single proxied_to_staging row and
does not write to the jobs table, so the staging environment can be
exercised end-to-end against a real repository (riscv-runner-sample)
without provisioning a prod runner.

Transport errors return 502 so GitHub redelivers; an audit row is still
written. The proxy client lives on App as StagingProxy so tests can
inject a stub RoundTripper. Three tests pin the contract: happy path
(body + headers + status + response headers relayed, one event row, no
job row), upstream failure → 502, and shouldProxyToStaging's four
negative cases.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* Switch ghfe/scheduler scaling to CPU usage; halve resource limits

Both containers were scaling on Scaleway's default trigger (concurrent
requests), which is a poor signal for our workload: ghfe handles short
webhook bursts where queue depth doesn't reflect work in flight, and the
scheduler is pinned 1/1 anyway. Switch both to scalingOption.type =
cpu_usage at 70% so replicas grow when there's real CPU pressure, and
halve cpuLimit (500 -> 250 mCPU) and memoryLimit (512 -> 256 MiB) to
match the actual footprint of the Go binaries.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* ghfe: validate ownerType once, use entity in event record

Two PR #24 review threads pointed at the same restructure:

- The early entity construction cast ownerType to internal.EntityType
  directly, silently producing a zero-value Type for any unknown string
  and shipping it onward. The ParseEntityType check downstream never
  ran for the proxy path (and didn't run at all for ignored_action
  rows). Move the validation up next to the ownerID-missing check so
  every workflow_job code path -- proxy, ignored_action, the three
  real actions -- builds entity from a parsed Type or returns 400.
- The base eventRecord still pointed at the loose ownerType/ownerID/
  ownerLogin locals. Switch the three pointer fields to take the
  addresses of entity.Type (via a string local since eventRecord.EntityType
  is *string), entity.ID, and entity.Name.

Slight behavioral side-effect: a workflow_job with a malformed
owner.type (in practice impossible from GitHub) now returns 400 with
no row written, where before an ignored_action row would still have
been recorded for non-{queued,in_progress,completed} actions.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* ghfe: inline EntityType pointer via explicit conversion

Per PR #24 review: drop the entityTypeStr local and convert the
pointer at the eventRecord literal directly. `(*string)(&entity.Type)`
works because Go allows explicit conversion between unnamed pointer
types whose base types have identical underlying types -- here
internal.EntityType's underlying is string. The implicit form
(`&entity.Type`) doesn't compile because eventRecord.EntityType is
declared as *string, and Go won't auto-coerce *internal.EntityType to
*string.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
luhenry added a commit that referenced this pull request May 15, 2026
…halve resource limits (#24)

* Add staging proxy to Go ghfe

Restore the staging-proxy behavior the Python ghfe had: when the prod
instance receives a workflow_job for an entity/repo listed in
EntityConfig.Staging, forward the unmodified body and every request
header to the staging ghfe (STAGING_URL) and relay its response back to
GitHub. The prod instance records a single proxied_to_staging row and
does not write to the jobs table, so the staging environment can be
exercised end-to-end against a real repository (riscv-runner-sample)
without provisioning a prod runner.

Transport errors return 502 so GitHub redelivers; an audit row is still
written. The proxy client lives on App as StagingProxy so tests can
inject a stub RoundTripper. Three tests pin the contract: happy path
(body + headers + status + response headers relayed, one event row, no
job row), upstream failure → 502, and shouldProxyToStaging's four
negative cases.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* Switch ghfe/scheduler scaling to CPU usage; halve resource limits

Both containers were scaling on Scaleway's default trigger (concurrent
requests), which is a poor signal for our workload: ghfe handles short
webhook bursts where queue depth doesn't reflect work in flight, and the
scheduler is pinned 1/1 anyway. Switch both to scalingOption.type =
cpu_usage at 70% so replicas grow when there's real CPU pressure, and
halve cpuLimit (500 -> 250 mCPU) and memoryLimit (512 -> 256 MiB) to
match the actual footprint of the Go binaries.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* ghfe: validate ownerType once, use entity in event record

Two PR #24 review threads pointed at the same restructure:

- The early entity construction cast ownerType to internal.EntityType
  directly, silently producing a zero-value Type for any unknown string
  and shipping it onward. The ParseEntityType check downstream never
  ran for the proxy path (and didn't run at all for ignored_action
  rows). Move the validation up next to the ownerID-missing check so
  every workflow_job code path -- proxy, ignored_action, the three
  real actions -- builds entity from a parsed Type or returns 400.
- The base eventRecord still pointed at the loose ownerType/ownerID/
  ownerLogin locals. Switch the three pointer fields to take the
  addresses of entity.Type (via a string local since eventRecord.EntityType
  is *string), entity.ID, and entity.Name.

Slight behavioral side-effect: a workflow_job with a malformed
owner.type (in practice impossible from GitHub) now returns 400 with
no row written, where before an ignored_action row would still have
been recorded for non-{queued,in_progress,completed} actions.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da

* ghfe: inline EntityType pointer via explicit conversion

Per PR #24 review: drop the entityTypeStr local and convert the
pointer at the eventRecord literal directly. `(*string)(&entity.Type)`
works because Go allows explicit conversion between unnamed pointer
types whose base types have identical underlying types -- here
internal.EntityType's underlying is string. The implicit form
(`&entity.Type`) doesn't compile because eventRecord.EntityType is
declared as *string, and Go won't auto-coerce *internal.EntityType to
*string.

https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant