Skip to content

Reduce Github GetTektonDir API requests from 3 requests to 1 request#2721

Draft
aThorp96 wants to merge 2 commits into
tektoncd:mainfrom
aThorp96:github-api-performance
Draft

Reduce Github GetTektonDir API requests from 3 requests to 1 request#2721
aThorp96 wants to merge 2 commits into
tektoncd:mainfrom
aThorp96:github-api-performance

Conversation

@aThorp96
Copy link
Copy Markdown
Member

@aThorp96 aThorp96 commented May 12, 2026

📝 Description of the Change

This PR updates Github's GetTektonDir method to use a single GraphQL call to fetch the tekton directory provenance and content, reducing the API request volume from 3 requests to 1 request.

Since processing any supported event requires fetching the tekton directory, GetTektonDir is a hot path and API calls can add up without any PipelineRuns executing. Reducing the load in this hot path has an outsized impact. In some cases I saw, this change would have eliminated over 1000 API calls in an hour for a single repository.

Before this change, fetching the Tekton directory made the following API requests:

  1. (optionally) get the commit sha for the default branch
  2. Get the root directory contents to get the .tekton tree id/sha
  3. Get the list of blob shas under the .tekton tree
  4. Get the contents of all blob shas in the list

After this change, fetching the tekton directory makes the following API request(s):

  1. (optionally) get the commit sha for the default branch
  2. Get the .tekton directory, its commit sha, all the blobs inside .tekton, and all their contents, in one GraphQL request

🔗 Linked GitHub Issue

Fixes #

🧪 Testing Strategy

  • Unit tests
  • Integration tests
  • End-to-end tests
  • Manual testing
  • Not Applicable

🤖 AI Assistance

AI assistance can be used for various tasks, such as code generation,
documentation, or testing.

Please indicate whether you have used AI assistance
for this PR and provide details if applicable.

  • I have not used any AI assistance for this PR.
  • I have used AI assistance for this PR.

Important

Slop will be simply rejected, if you are using AI assistance you need to make sure you
understand the code generated and that it meets the project's standards. you
need at least know how to run the code and deploy it (if needed). See
startpaac to make it easy
to deploy and test your code changes.

If the majority of the code in this PR was generated by an AI, please add a Co-authored-by trailer to your commit message.
For example:

Co-authored-by: Claude noreply@anthropic.com

✅ Submitter Checklist

  • 📝 My commit messages are clear, informative, and follow the project's How to write a git commit message guide. The Gitlint linter ensures in CI it's properly validated
  • ✨ I have ensured my commit message prefix (e.g., fix:, feat:) matches the "Type of Change" I selected above.
  • ♽ I have run make test and make lint locally to check for and fix any
    issues. For an efficient workflow, I have considered installing
    pre-commit and running pre-commit install to
    automate these checks.
  • 📖 I have added or updated documentation for any user-facing changes.
  • 🧪 I have added sufficient unit tests for my code changes.
  • 🎁 I have added end-to-end tests where feasible. See README for more details.
  • 🔎 I have addressed any CI test flakiness or provided a clear reason to bypass it.
  • If adding a provider feature, I have filled in the following and updated the provider documentation:
    • GitHub App
    • GitHub Webhook
    • Gitea/Forgejo
    • GitLab
    • Bitbucket Cloud
    • Bitbucket Data Center

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 46.30872% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.31%. Comparing base (c615efb) to head (b754b3d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/test/github/github.go 0.00% 73 Missing ⚠️
pkg/provider/github/graphql.go 92.45% 2 Missing and 2 partials ⚠️
pkg/provider/github/github.go 86.95% 2 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2721      +/-   ##
==========================================
+ Coverage   59.25%   59.31%   +0.05%     
==========================================
  Files         208      208              
  Lines       20573    20507      -66     
==========================================
- Hits        12191    12163      -28     
+ Misses       7610     7582      -28     
+ Partials      772      762      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the GitHub provider to fetch the .tekton directory using a single GraphQL query, reducing the number of API calls. Review feedback identifies several critical issues: the YAML concatenation logic is non-deterministic due to map iteration, a potential nil pointer dereference exists in the GraphQL client, and the new implementation lacks support for recursive directory structures. Additionally, the mock implementation for tests was found to be inconsistent with the actual GitHub API regarding path handling.

Comment thread pkg/provider/github/github.go
Comment thread pkg/provider/github/graphql.go
Comment on lines +96 to +114
query = `query($owner: String!, $name: String!, $tektonExpr: String!) {
repository(owner: $owner, name: $name) {
%s
tektonTree: object(expression: $tektonExpr) {
... on Tree {
entries {
name
type
path
oid
object {
... on Blob {
text
}
}
}
}
}
}
}`, strings.Join(aliases, "\n"))
}`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change introduces a regression in functionality by losing support for recursive directory structures. The new GraphQL query only fetches the immediate entries of the .tekton directory. The previous REST-based implementation used recursive=true, allowing users to organize their Tekton resources in subdirectories (e.g., .tekton/pull-requests/). With this change, any YAML files located in subfolders will be silently ignored.

References
  1. Repository pipeline files are conventionally located in the .tekton/ subdirectory, and the system should support the hierarchical structure within this directory.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug and not something I caught during testing. Marking this PR as a Draft to prevent merging this regression

Comment thread pkg/test/github/github.go
Comment on lines +129 to +148
pathInTekton := strings.TrimPrefix(relPath, tektonPath+"/")
if pathInTekton == "" {
continue
}

// Read file content
content, err := os.ReadFile(fileInfo.name)
if err != nil {
continue
}

// Only include .yaml and .yml files (matching implementation behavior)
if !strings.HasSuffix(relPath, ".yaml") && !strings.HasSuffix(relPath, ".yml") {
continue
}

entry := map[string]any{
"name": filepath.Base(pathInTekton),
"type": "blob",
"path": pathInTekton,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mock implementation of handleTektonTreeQuery is inconsistent with the real GitHub GraphQL API. The mock returns paths relative to the .tekton directory (e.g., pipeline.yaml), whereas the real GitHub API returns the full path from the repository root (e.g., .tekton/pipeline.yaml) for TreeEntry.path. This discrepancy can cause tests to pass with incorrect assumptions about path formats, which may lead to failures in production or when running against real GitHub instances, especially in error reporting or path matching logic.

References
  1. Repository pipeline files are identified by their location in a subdirectory (e.g., .tekton/), and path handling should consistently use the full repository path to maintain this convention.

aThorp96 added 2 commits May 12, 2026 14:22
Fetch .tekton directory tree with inline blob contents in single GraphQL
query instead of 3 sequential API calls. Since processing any supported
event requires fetching the tekton directory, GetTektonDir is a hot path
and API calls can add up before without any PipelineRuns executing.
Reducing the load in this hot path has an outsized impact. In some cases
I saw, this change would have eliminated over 1000 API calls in an hour
for a single repository.

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Delete 7 test cases that duplicate coverage already provided by
SetupGitTree integration tests. The SetupGitTree helper automatically
constructs GraphQL responses from testdata directories, making manual
GraphQL response construction redundant for happy path testing.

Deleted from TestFetchTektonDirGraphQL (6 cases):
- successful fetch with yaml files
- no tekton directory
- tekton path is file not directory
- mixed yaml and yml extensions
- subdirectories with yaml files
- empty tekton directory

Deleted from TestGetTektonDirGraphQL (1 case):
- invalid yaml validation failure (duplicate of existing test)

Kept only error cases that cannot be represented via filesystem:
- http error, graphql errors, null blob content

This reduces test code by 213 lines (34% reduction) while maintaining
full coverage through integration tests.

Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@aThorp96 aThorp96 force-pushed the github-api-performance branch from b754b3d to 20b9c5d Compare May 12, 2026 18:22
@aThorp96
Copy link
Copy Markdown
Member Author

/hold

@aThorp96 aThorp96 marked this pull request as draft May 12, 2026 21:51
Copy link
Copy Markdown
Member

@theakshaypant theakshaypant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven’t gone through the changes properly yet, just sharing thoughts based on the description.

IMO, the title might be a bit misleading for the actual problem this PR is trying to solve (per my understanding: rate limiting). While this reduces the number of requests to 1, it’s a GraphQL request, and GraphQL rate limits are calculated differently from REST requests. Ref
Even this single reduced GraphQL request could end up consuming more points than the REST equivalent. Would love for that last statement to be wrong, but I think it would be worth comparing how optimised this single request is from a rate-limit consumption perspective across a few scenarios:

  • no .tekton dir
  • 3–4 PipelineRuns in .tekton
  • > 100 manifests in the dir

WDYT @aThorp96? Unless I’ve completely misunderstood the description and “API requests” was referring specifically to request count limits when you mentioned:

In some cases I saw, this change would have eliminated over 1000 API calls in an hour for a single repository.

Adding another reference for REST vs GraphQL rate limits here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants