Conversation
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c5606eb60
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "commit_sha": args.string(), | ||
| "workspace": args.string(default = "."), | ||
| "build_url": args.string(default = "-"), | ||
| "bazel_flag": args.string(), |
There was a problem hiding this comment.
Declare bazel_flag as a list argument
bazel_flag is defined as args.string(), but _delivery_impl treats ctx.args.bazel_flag as a mutable list (append) and as a list for joining. If a user passes --bazel_flag=..., this value is a string and the first append call fails at runtime, so aspect delivery breaks whenever custom Bazel flags are supplied.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 47c47cb408
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| gh_owner = repo_parts[0] if len(repo_parts) >= 2 else "" | ||
| gh_repo = repo_parts[1] if len(repo_parts) >= 2 else "" | ||
|
|
||
| task.config.strategy = make_github_strategy( |
There was a problem hiding this comment.
Replace task.config usage with fragment-based config access
This config mutates task.config, but ConfiguredTask in this commit no longer exposes a config attribute (only name/group/symbol/path), so when configure_rules_lint runs (e.g. when aspect_rules_lint is declared), evaluation fails with an attribute error instead of configuring lint behavior.
Useful? React with 👍 / 👎.
|
@devin-ai-integration review |
| let build = this.downcast_ref::<Build>().unwrap(); | ||
| let execlog_stream = build.execlog_stream.borrow(); | ||
| let execlog_stream = execlog_stream.as_ref().ok_or(anyhow::anyhow!( | ||
| "call `ctx.bazel.build` with `execution_logs = true` in order to receive execution log events." |
There was a problem hiding this comment.
crates/axl-runtime/src/engine/bazel/build.rs:R361
Error message references old parameter name execution_logs instead of renamed execution_log
The error message at line 361 tells users to call ctx.bazel.build with execution_logs = true, but the parameter was renamed from execution_logs to execution_log (singular) in this PR.
Impact
The Starlark API parameter is now execution_log (see crates/axl-runtime/src/engine/bazel/mod.rs:92), but the error message still says execution_logs. Users following this error message would pass the wrong parameter name and get a different error about unexpected keyword arguments, making debugging confusing.
|
This giant PR is rough on reviewers. I understand if it's too much work, but ideally it would have one or more of these things broken out to its own PR:
|
| host = host_port.split(":")[0] | ||
| blob_hash = parts[2] | ||
| blob_size = parts[3] | ||
| return _BB_CLIENTD_ROOT + "/cas/" + host + "/blobs/sha256/file/" + blob_hash + "-" + blob_size |
There was a problem hiding this comment.
Only needed on Aspect runner and only if bbclientd is enabled... we could probably detect that via combination of the ASPECT_WORKFLOWS_RUNNER env + the ASPECT_WORKFLOWS_REMOTE url if it is a bbclientd path
There was a problem hiding this comment.
Hmm. If this is some non-workflows runner and a remote cache is used then we will see a bytestream:// URL here but we won't be able to just read the file from a FUSE mount via bbclientd. In that case, we'll need to translate to the expected bazel-out path for that test log I think. You could leave a TODO for that case and land this with that case not yet working.
There was a problem hiding this comment.
Another thing we could do is: not depend on this at all, AXL supports talking to the remote cache directly using grpc bindings, so we could technically do all this ourselves without sweating too much, would that be more reasonable?
| return "github" | ||
| elif env.var("BUILDKITE_AGENT_ACCESS_TOKEN"): | ||
| return "buildkite" | ||
| elif env.var("CI_JOB_TOKEN"): |
There was a problem hiding this comment.
GitLab and CCI didn't get tested yet since we don't have CI on those platforms for aspect-cli?
There was a problem hiding this comment.
Yes, those platforms are MIA for now. Will need more work here, that is going to be another bet since we are out of time for this cycle
| print("\tCLI: 'gcloud logging read --project " + meta["account_id"] + " \"resource.type=gce_instance resource.labels.instance_id=" + meta["instance_id"] + " log_name=projects/" + meta["account_id"] + "/logs/google_metadata_script_runner\" --format=\"value(jsonPayload.message)\" --freshness=30d | tac'") | ||
|
|
||
|
|
||
| def configure_workflows_env(fs): |
There was a problem hiding this comment.
Lets drop the "configure" and add "runner" to this naming. It is not configuring anything. Just displaying information about the Workflows runner environment.
| } | ||
|
|
||
|
|
||
| def read_platform_config(fs, platform_dir = DEFAULT_PLATFORM_DIR): |
There was a problem hiding this comment.
What is load bearing from the platform config files still?
There was a problem hiding this comment.
There is nothing here as far as i can tell. only warming status since those can't go into environment variables.
There was a problem hiding this comment.
I'll clean this up now.
| return config | ||
|
|
||
|
|
||
| def read_warming_config(fs, platform_dir = DEFAULT_PLATFORM_DIR): |
There was a problem hiding this comment.
Is there already warming logic in this PR or can we leave that for a follow-up. We need to decide where to draw the line between the runner and AXL still.
There was a problem hiding this comment.
This is just reading the warming status from the platform directory to informational purposes. No warming logic here.
| health = ctx.bazel.health_check() | ||
| fragment = ctx.fragments[BazelFragment] | ||
|
|
||
| if fragment.post_health_check: |
There was a problem hiding this comment.
Lets make this pre_bazel_health_check and run it before ctx.bazel.health_check(); this matches what we currently do.
If we're unhealthy, I believe the current code cycles the runner. Lets do that as well. You can leave it as a TODO for a follow-up as this is already huge and should land soon.
There was a problem hiding this comment.
Hmm, i was thinking that if the Bazel health check fails here the config.axl would inject behavior and cycle the runner. How do you imagine we'd do that if we don't know if the bazel server is healthy?
There was a problem hiding this comment.
Oh.. I see the wiring now. You pass the health from the ctx.bazel.health_check() to the post_health_check. Let me think for a minute...
So the post_health_check in the BazelFrament seems like a place for general health check hooks with one of its parameters is the result of bazel.health_check():
bazel_health = ctx.bazel.health_check()
health_check_fragment = ctx.fragments[HealthCheckFragment]
if health_check_fragment.health_check:
health_check_fragment.health_check(ctx, bazel_health)
On Workflows runners, the between job health check (which checks a lot more than Bazel) is also considered implicitly and the runner is cycled if the runner is deemed unhealthy from all sources.
Tying the post_health_check to the BazelFragment is wrong as not all tasks will run bazel but all tasks should a health check.
There was a problem hiding this comment.
Okay. So what we're missing is a well know binary to run to signal the instance unhealthy which will get call by health_check_fragment.health_check if the bazel server is unhealthy and then it will fail the job. If we're trusting our bazel health check then there is no point in starting the job if the bazel server is unhealthy.
There was a problem hiding this comment.
I'll put up a change in silo
There was a problem hiding this comment.
Tying the post_health_check to the BazelFragment is wrong as not all tasks will run bazel but all tasks should a health check.
Hmm this is interesting, so all tasks will be performing this check, so should we be having a separate Fragment for healthcheck then? eg: HealthCheck(system_health_check, bazel_health_check)? or introduce another SystemFragment which has a health_check attribute?
There was a problem hiding this comment.
As for the binary, would it be better to design a http server exposed by fleet service which we use to control runner lifecycle? This would also be useful for querying warming status and other dynamic values that can't go into environment variables.
| platform_config: dict from read_platform_config() | ||
| host_config: dict from read_host_config() | ||
| bazel_version: str like "7.0.0" or None | ||
| workspace: workspace name (default ".") |
There was a problem hiding this comment.
Workspace is just currrent directory and no longer needed
Done |
Changes are visible to end-users: yes/no
Test plan