Skip to content

Add proxy cache filter proposal#280

Open
stonezdj wants to merge 7 commits into
goharbor:mainfrom
stonezdj:26apr02_add_proxy_cache_filter
Open

Add proxy cache filter proposal#280
stonezdj wants to merge 7 commits into
goharbor:mainfrom
stonezdj:26apr02_add_proxy_cache_filter

Conversation

@stonezdj
Copy link
Copy Markdown

@stonezdj stonezdj commented Apr 2, 2026

No description provided.

@stonezdj stonezdj requested review from a team as code owners April 2, 2026 09:38
@stonezdj stonezdj force-pushed the 26apr02_add_proxy_cache_filter branch from 0ab43d1 to a0559f5 Compare April 2, 2026 09:40
Signed-off-by: stonezdj <stonezdj@gmail.com>
@stonezdj stonezdj force-pushed the 26apr02_add_proxy_cache_filter branch from a0559f5 to e97b57b Compare April 2, 2026 09:43
stonezdj added 3 commits April 2, 2026 17:45
Signed-off-by: stonezdj <stonezdj@gmail.com>
Signed-off-by: stonezdj <stonezdj@gmail.com>
Signed-off-by: stonezdj <stonezdj@gmail.com>
@wy65701436 wy65701436 requested a review from chlins April 21, 2026 08:48
Comment thread proposals/new/filter_repository_proxycache.md Outdated
Comment thread proposals/new/filter_repository_proxycache.md Outdated
stonezdj and others added 3 commits May 7, 2026 13:13
Signed-off-by: stonezdj <stonezdj@gmail.com>
Signed-off-by: stonezdj <stonezdj@gmail.com>
…a keys

Use proxy_filter_pattern and proxy_filter_kind as separate string metadata
keys instead of a single JSON-encoded repository_filter value, consistent
with Harbor's existing project metadata conventions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Member

@chlins chlins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -0,0 +1,156 @@
Proposal: Limit proxy cache repositories by filter
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question, do we plan to support regex for proxy cache? if so, should we extend this capability to other related functionalities, such as replication and tag retention?

I proposal in v2.16, we keep the scope of double star. And we can introduce an enhancement in v2.17 to provide an new filter kind - regex.

Copy link
Copy Markdown
Member

@Vad1mo Vad1mo May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting only double star ** and add regex later makes ** a legacy feature from the beginning, that needs to be supported forever and maintained alongside regex.

Regexes are a more powerful capability, we should opt-in for doing regex only.


1. Existing proxy cache projects without filters keep current behavior (allow all).
2. New behavior is opt-in by configuring `proxy_filter_pattern` and `proxy_filter_kind`.
3. Clients receive `404 Not Found` for repositories that do not match the configured filter.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning 404 Not Found is a good compatibility/security choice because it avoids revealing whether the repository exists upstream.

But the log should clearly say:

proxy cache repository blocked by filter
project=<project>
repository=<repo>
filter_kind=<kind>
filter_pattern=<pattern>


Proposed metadata keys:

- `proxy_filter_pattern`: the filter pattern string (e.g. `library/**` or `^library/.*`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy_cache_filter_pattern
proxy_cache_filter_kind

more explicit names may be better

2. `PUT /api/v2.0/projects/{project_name_or_id}`
3. `GET /api/v2.0/projects/{project_name_or_id}` and list APIs should include configured filters in response.

`proxy_filter_pattern` and `proxy_filter_kind` are metadata for proxy cache projects only. For non-proxy projects, these keys are ignored on create/update and omitted from response.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reject these fields for non-proxy projects with 400 Bad Request?


Notes on behavior:

1. Invalid pattern in `proxy_filter_pattern` is treated as non-match at runtime.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treating invalid patterns as runtime non-matches makes configuration errors hard to detect and can cause unintended silent blockages. Instead, we should compile and validate the pattern during project metadata mutation (POST/PUT APIs) and return 400 Bad Request if the pattern is invalid.


### Request enforcement point

Enforcement should happen in proxy middleware before upstream manifest proxy operations:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure a complete fail-closed implementation, we need to explicitly cover all manifest entry points in the middleware pre-check layer. This includes manifest GET and HEAD requests, resolved by either tag or digest. Harbor must return a 404

Matching semantics:

1. If `proxy_filter_pattern` is empty or not set, all repositories are allowed (same as today).
2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which repository namespace does the proxy_filter_pattern match against—the upstream registry path (library/redis) or Harbor's local proxy project namespace (dockerhub-proxy/library/redis)?

Copy link
Copy Markdown
Member

@Vad1mo Vad1mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just like other policies in harbor the proxy policy should be first class citizen of Harbor and not a toggle during project creation.

@@ -0,0 +1,156 @@
Proposal: Limit proxy cache repositories by filter
Copy link
Copy Markdown
Member

@Vad1mo Vad1mo May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting only double star ** and add regex later makes ** a legacy feature from the beginning, that needs to be supported forever and maintained alongside regex.

Regexes are a more powerful capability, we should opt-in for doing regex only.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually and logically, project creation and proxy policy should be separated. Like we have a replication policy we should have a proxy policy. This is a good opportunity to split it.

On top it would also allow us to have multiple rules per proxy policy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image


Proposed metadata keys:

- `proxy_filter_pattern`: the filter pattern string (e.g. `library/**` or `^library/.*`).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex anchoring is inconsistent. Line 19 uses unanchored (library|goharbor|myorg)/.; line 85 uses anchored ^library/..

Either auto-anchor (^…$) or document that the author must anchor. Strongly recommend auto-anchoring


1. `proxy_filter_pattern` can be empty (empty means match all).
2. `proxy_filter_kind` must be `regex` or `doublestar` when specified; defaults to `doublestar` if omitted.
3. An unrecognised `proxy_filter_kind` value is rejected with `400 Bad Request` during project metadata validation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a contradiction to line 76

Invalid pattern in proxy_filter_pattern is treated as non-match at runtime.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Copy link
Copy Markdown
Member

@bupd bupd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to improve wording.

@stonezdj

Matching semantics:

1. If `proxy_filter_pattern` is empty or not set, all repositories are allowed (same as today).
2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern.
2. If `proxy_filter_pattern` is configured, Harbor allows new upstream proxying only when the normalized upstream repository path matches the configured pattern.

1. If `proxy_filter_pattern` is empty or not set, all repositories are allowed (same as today).
2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern.
3. `proxy_filter_kind` specifies the matching mode: `regex` or `doublestar`. It is optional; if omitted, Harbor uses `doublestar` matching by default.
4. If the repository does not match, Harbor returns `404 Not Found` and does not request content from upstream.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. If the repository does not match, Harbor returns `404 Not Found` and does not request content from upstream.
4. If the normalized repository does not match, Harbor returns `404 Not Found` for the upstream proxy attempt and does not request content from upstream. Already cached local content is not affected by this filter.

I believe things would be better if this proposal only focuses on preventing new cache fills.


1. Existing proxy cache projects without filters keep current behavior (allow all).
2. New behavior is opt-in by configuring `proxy_filter_pattern` and `proxy_filter_kind`.
3. Clients receive `404 Not Found` for repositories that do not match the configured filter.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Clients receive `404 Not Found` for repositories that do not match the configured filter.
3. For existing proxy cache projects where a filter is later configured, the filter only affects new upstream proxy requests and cache fills. Artifacts already cached locally remain accessible through existing harbor proxy behaviour.
4. Clients receive `404 Not Found` for repositories that do not match the configured filter.

better more transparent wording.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants