Skip to content

[WIP]: Support IAM role for access to S3#951

Draft
TomAugspurger wants to merge 1 commit intorapidsai:mainfrom
TomAugspurger:tom/imds-credential
Draft

[WIP]: Support IAM role for access to S3#951
TomAugspurger wants to merge 1 commit intorapidsai:mainfrom
TomAugspurger:tom/imds-credential

Conversation

@TomAugspurger
Copy link
Copy Markdown
Contributor

@TomAugspurger TomAugspurger commented Apr 3, 2026

(I need to perform a self-review first).

This PR adds support for IAM role-based authentication to S3 for KvikIO. Briefly, this is a way for compute resources like an EC2 instance to authenticate with S3 without the application / user needing to explicitly handle credentials. Instead, they assign a role to the EC2 instance and applications / libraries (like kvikio) can follow a protocol to get credentials temporary credentials.

User API for specifying credentials

This PR includes a user-facing change in how users should provide credentials to kvikio. Previously, we accepted named arguments like aws_access_key_id=..., aws_secret_access_key=... or environment variables like AWS_ACCESS_KEY_ID=.... This is fine when all of your authentication methods (static tokens, env vars) ultimately map to the same bits of data (access key ID, secret access key, etc.) It breaks down when you have different arguments that configure how you get your credentials.

So, as a precursor to the main change, we introduce a new credential=... argument, which is responsible for describing how kvikio should get the credentials.

  1. Using static credentials: credential=AwsStaticCredential(aws_access_key_id=..., ...)
  2. Using environment variables: credential=AwsEnvironmentCredential()
  3. Using IAM roles: credential=AwsIamRoleCredential().

The new default tries various authentication methods, including environment variables, which preserves backwards compatibility. The previous method specifying named arguments is deprecated.

IAM Role Credentials

See the linked issue for more, but the brief explanation of how this new credential provider works is to retrieve temporary credentials from the AWS provided Instance Metadata Service (IMDS) endpoint. Once fetched, these tokens should be cached and reused until (close to) their expiry, at which point the application (or library, like kvikio) should refresh the token.

The actual tokens retrieved from the IMDS are used in the exact same was as we currently use them.

Closes #950

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@TomAugspurger
Copy link
Copy Markdown
Contributor Author

/ok to test ae3d3b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support AWS IAM Role for reading from S3

1 participant