Skip to content

feat: workflows + axl#937

Closed
thesayyn wants to merge 0 commit intomainfrom
axel-f
Closed

feat: workflows + axl#937
thesayyn wants to merge 0 commit intomainfrom
axel-f

Conversation

@thesayyn
Copy link
Member

@thesayyn thesayyn commented Feb 17, 2026


Changes are visible to end-users: yes/no

  • Searched for relevant documentation and updated as needed: yes/no
  • Breaking change (forces users to change their own code or config): yes/no
  • Suggested release notes appear below: yes/no

Test plan

  • Covered by existing test cases
  • New test cases added
  • Manual testing; please provide instructions so we can reproduce:

Open with Devin

@alexeagle
Copy link
Member

@codex review

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

@alexeagle
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c5606eb60

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"commit_sha": args.string(),
"workspace": args.string(default = "."),
"build_url": args.string(default = "-"),
"bazel_flag": args.string(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Declare bazel_flag as a list argument

bazel_flag is defined as args.string(), but _delivery_impl treats ctx.args.bazel_flag as a mutable list (append) and as a list for joining. If a user passes --bazel_flag=..., this value is a string and the first append call fails at runtime, so aspect delivery breaks whenever custom Bazel flags are supplied.

Useful? React with 👍 / 👎.

@alexeagle alexeagle marked this pull request as draft February 26, 2026 15:18
@alexeagle alexeagle marked this pull request as ready for review February 26, 2026 15:18
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47c47cb408

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gh_owner = repo_parts[0] if len(repo_parts) >= 2 else ""
gh_repo = repo_parts[1] if len(repo_parts) >= 2 else ""

task.config.strategy = make_github_strategy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace task.config usage with fragment-based config access

This config mutates task.config, but ConfiguredTask in this commit no longer exposes a config attribute (only name/group/symbol/path), so when configure_rules_lint runs (e.g. when aspect_rules_lint is declared), evaluation fails with an attribute error instead of configuring lint behavior.

Useful? React with 👍 / 👎.

@alexeagle
Copy link
Member

@devin-ai-integration review

let build = this.downcast_ref::<Build>().unwrap();
let execlog_stream = build.execlog_stream.borrow();
let execlog_stream = execlog_stream.as_ref().ok_or(anyhow::anyhow!(
"call `ctx.bazel.build` with `execution_logs = true` in order to receive execution log events."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/axl-runtime/src/engine/bazel/build.rs:R361

Error message references old parameter name execution_logs instead of renamed execution_log

The error message at line 361 tells users to call ctx.bazel.build with execution_logs = true, but the parameter was renamed from execution_logs to execution_log (singular) in this PR.

Impact

The Starlark API parameter is now execution_log (see crates/axl-runtime/src/engine/bazel/mod.rs:92), but the error message still says execution_logs. Users following this error message would pass the wrong parameter name and get a different error about unexpected keyword arguments, making debugging confusing.

@alexeagle
Copy link
Member

This giant PR is rough on reviewers. I understand if it's too much work, but ideally it would have one or more of these things broken out to its own PR:

  1. Fragment system: fragment() / attr() replace record() / field()
  2. Module system: use_config() and config wiring in main.rs
  3. Builtins relocated from axl-runtime to aspect-cli
  4. HTTP: unix sockets, new methods, streaming body, Future combinators
  5. Bazel: health check, execution log sinks, StreamingFile
  6. Remote execution API and starbuf-derive enhancements
  7. CI/Build infrastructure

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 9 additional findings in Devin Review.

Open in Devin Review

gregmagolan added a commit that referenced this pull request Feb 26, 2026
host = host_port.split(":")[0]
blob_hash = parts[2]
blob_size = parts[3]
return _BB_CLIENTD_ROOT + "/cas/" + host + "/blobs/sha256/file/" + blob_hash + "-" + blob_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only needed on Aspect runner and only if bbclientd is enabled... we could probably detect that via combination of the ASPECT_WORKFLOWS_RUNNER env + the ASPECT_WORKFLOWS_REMOTE url if it is a bbclientd path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If this is some non-workflows runner and a remote cache is used then we will see a bytestream:// URL here but we won't be able to just read the file from a FUSE mount via bbclientd. In that case, we'll need to translate to the expected bazel-out path for that test log I think. You could leave a TODO for that case and land this with that case not yet working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we could do is: not depend on this at all, AXL supports talking to the remote cache directly using grpc bindings, so we could technically do all this ourselves without sweating too much, would that be more reasonable?

return "github"
elif env.var("BUILDKITE_AGENT_ACCESS_TOKEN"):
return "buildkite"
elif env.var("CI_JOB_TOKEN"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitLab and CCI didn't get tested yet since we don't have CI on those platforms for aspect-cli?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those platforms are MIA for now. Will need more work here, that is going to be another bet since we are out of time for this cycle

print("\tCLI: 'gcloud logging read --project " + meta["account_id"] + " \"resource.type=gce_instance resource.labels.instance_id=" + meta["instance_id"] + " log_name=projects/" + meta["account_id"] + "/logs/google_metadata_script_runner\" --format=\"value(jsonPayload.message)\" --freshness=30d | tac'")


def configure_workflows_env(fs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets drop the "configure" and add "runner" to this naming. It is not configuring anything. Just displaying information about the Workflows runner environment.

}


def read_platform_config(fs, platform_dir = DEFAULT_PLATFORM_DIR):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is load bearing from the platform config files still?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing here as far as i can tell. only warming status since those can't go into environment variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean this up now.

return config


def read_warming_config(fs, platform_dir = DEFAULT_PLATFORM_DIR):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there already warming logic in this PR or can we leave that for a follow-up. We need to decide where to draw the line between the runner and AXL still.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just reading the warming status from the platform directory to informational purposes. No warming logic here.

health = ctx.bazel.health_check()
fragment = ctx.fragments[BazelFragment]

if fragment.post_health_check:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make this pre_bazel_health_check and run it before ctx.bazel.health_check(); this matches what we currently do.

If we're unhealthy, I believe the current code cycles the runner. Lets do that as well. You can leave it as a TODO for a follow-up as this is already huge and should land soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, i was thinking that if the Bazel health check fails here the config.axl would inject behavior and cycle the runner. How do you imagine we'd do that if we don't know if the bazel server is healthy?

Copy link
Member

@gregmagolan gregmagolan Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. I see the wiring now. You pass the health from the ctx.bazel.health_check() to the post_health_check. Let me think for a minute...

So the post_health_check in the BazelFrament seems like a place for general health check hooks with one of its parameters is the result of bazel.health_check():

bazel_health = ctx.bazel.health_check()
health_check_fragment = ctx.fragments[HealthCheckFragment]
if health_check_fragment.health_check:
    health_check_fragment.health_check(ctx, bazel_health)

On Workflows runners, the between job health check (which checks a lot more than Bazel) is also considered implicitly and the runner is cycled if the runner is deemed unhealthy from all sources.

Tying the post_health_check to the BazelFragment is wrong as not all tasks will run bazel but all tasks should a health check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. So what we're missing is a well know binary to run to signal the instance unhealthy which will get call by health_check_fragment.health_check if the bazel server is unhealthy and then it will fail the job. If we're trusting our bazel health check then there is no point in starting the job if the bazel server is unhealthy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put up a change in silo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tying the post_health_check to the BazelFragment is wrong as not all tasks will run bazel but all tasks should a health check.

Hmm this is interesting, so all tasks will be performing this check, so should we be having a separate Fragment for healthcheck then? eg: HealthCheck(system_health_check, bazel_health_check)? or introduce another SystemFragment which has a health_check attribute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the binary, would it be better to design a http server exposed by fleet service which we use to control runner lifecycle? This would also be useful for querying warming status and other dynamic values that can't go into environment variables.

platform_config: dict from read_platform_config()
host_config: dict from read_host_config()
bazel_version: str like "7.0.0" or None
workspace: workspace name (default ".")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workspace is just currrent directory and no longer needed

@thesayyn
Copy link
Member Author

This giant PR is rough on reviewers. I understand if it's too much work, but ideally it would have one or more of these things broken out to its own PR:

  1. Fragment system: fragment() / attr() replace record() / field()
  2. Module system: use_config() and config wiring in main.rs
  3. Builtins relocated from axl-runtime to aspect-cli
  4. HTTP: unix sockets, new methods, streaming body, Future combinators
  5. Bazel: health check, execution log sinks, StreamingFile
  6. Remote execution API and starbuf-derive enhancements
  7. CI/Build infrastructure

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants