Add proxy cache filter proposal#280
Conversation
0ab43d1 to
a0559f5
Compare
Signed-off-by: stonezdj <stonezdj@gmail.com>
a0559f5 to
e97b57b
Compare
Signed-off-by: stonezdj <stonezdj@gmail.com>
…a keys Use proxy_filter_pattern and proxy_filter_kind as separate string metadata keys instead of a single JSON-encoded repository_filter value, consistent with Harbor's existing project metadata conventions. Co-authored-by: Cursor <cursoragent@cursor.com>
| @@ -0,0 +1,156 @@ | |||
| Proposal: Limit proxy cache repositories by filter | |||
There was a problem hiding this comment.
Just a general question, do we plan to support regex for proxy cache? if so, should we extend this capability to other related functionalities, such as replication and tag retention?
I proposal in v2.16, we keep the scope of double star. And we can introduce an enhancement in v2.17 to provide an new filter kind - regex.
There was a problem hiding this comment.
Supporting only double star ** and add regex later makes ** a legacy feature from the beginning, that needs to be supported forever and maintained alongside regex.
Regexes are a more powerful capability, we should opt-in for doing regex only.
|
|
||
| 1. Existing proxy cache projects without filters keep current behavior (allow all). | ||
| 2. New behavior is opt-in by configuring `proxy_filter_pattern` and `proxy_filter_kind`. | ||
| 3. Clients receive `404 Not Found` for repositories that do not match the configured filter. |
There was a problem hiding this comment.
Returning 404 Not Found is a good compatibility/security choice because it avoids revealing whether the repository exists upstream.
But the log should clearly say:
proxy cache repository blocked by filter
project=<project>
repository=<repo>
filter_kind=<kind>
filter_pattern=<pattern>
|
|
||
| Proposed metadata keys: | ||
|
|
||
| - `proxy_filter_pattern`: the filter pattern string (e.g. `library/**` or `^library/.*`). |
There was a problem hiding this comment.
proxy_cache_filter_pattern
proxy_cache_filter_kind
more explicit names may be better
| 2. `PUT /api/v2.0/projects/{project_name_or_id}` | ||
| 3. `GET /api/v2.0/projects/{project_name_or_id}` and list APIs should include configured filters in response. | ||
|
|
||
| `proxy_filter_pattern` and `proxy_filter_kind` are metadata for proxy cache projects only. For non-proxy projects, these keys are ignored on create/update and omitted from response. |
There was a problem hiding this comment.
why not reject these fields for non-proxy projects with 400 Bad Request?
|
|
||
| Notes on behavior: | ||
|
|
||
| 1. Invalid pattern in `proxy_filter_pattern` is treated as non-match at runtime. |
There was a problem hiding this comment.
Treating invalid patterns as runtime non-matches makes configuration errors hard to detect and can cause unintended silent blockages. Instead, we should compile and validate the pattern during project metadata mutation (POST/PUT APIs) and return 400 Bad Request if the pattern is invalid.
|
|
||
| ### Request enforcement point | ||
|
|
||
| Enforcement should happen in proxy middleware before upstream manifest proxy operations: |
There was a problem hiding this comment.
To ensure a complete fail-closed implementation, we need to explicitly cover all manifest entry points in the middleware pre-check layer. This includes manifest GET and HEAD requests, resolved by either tag or digest. Harbor must return a 404
| Matching semantics: | ||
|
|
||
| 1. If `proxy_filter_pattern` is empty or not set, all repositories are allowed (same as today). | ||
| 2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern. |
There was a problem hiding this comment.
Which repository namespace does the proxy_filter_pattern match against—the upstream registry path (library/redis) or Harbor's local proxy project namespace (dockerhub-proxy/library/redis)?
Vad1mo
left a comment
There was a problem hiding this comment.
just like other policies in harbor the proxy policy should be first class citizen of Harbor and not a toggle during project creation.
| @@ -0,0 +1,156 @@ | |||
| Proposal: Limit proxy cache repositories by filter | |||
There was a problem hiding this comment.
Supporting only double star ** and add regex later makes ** a legacy feature from the beginning, that needs to be supported forever and maintained alongside regex.
Regexes are a more powerful capability, we should opt-in for doing regex only.
There was a problem hiding this comment.
Conceptually and logically, project creation and proxy policy should be separated. Like we have a replication policy we should have a proxy policy. This is a good opportunity to split it.
On top it would also allow us to have multiple rules per proxy policy.
|
|
||
| Proposed metadata keys: | ||
|
|
||
| - `proxy_filter_pattern`: the filter pattern string (e.g. `library/**` or `^library/.*`). |
There was a problem hiding this comment.
Regex anchoring is inconsistent. Line 19 uses unanchored (library|goharbor|myorg)/.; line 85 uses anchored ^library/..
Either auto-anchor (^…$) or document that the author must anchor. Strongly recommend auto-anchoring
|
|
||
| 1. `proxy_filter_pattern` can be empty (empty means match all). | ||
| 2. `proxy_filter_kind` must be `regex` or `doublestar` when specified; defaults to `doublestar` if omitted. | ||
| 3. An unrecognised `proxy_filter_kind` value is rejected with `400 Bad Request` during project metadata validation. |
There was a problem hiding this comment.
this is a contradiction to line 76
Invalid pattern in
proxy_filter_patternis treated as non-match at runtime.
| Matching semantics: | ||
|
|
||
| 1. If `proxy_filter_pattern` is empty or not set, all repositories are allowed (same as today). | ||
| 2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern. |
There was a problem hiding this comment.
| 2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern. | |
| 2. If `proxy_filter_pattern` is configured, Harbor allows new upstream proxying only when the normalized upstream repository path matches the configured pattern. |
| 1. If `proxy_filter_pattern` is empty or not set, all repositories are allowed (same as today). | ||
| 2. If `proxy_filter_pattern` is configured, Harbor allows the pull only when the requested repository matches the configured pattern. | ||
| 3. `proxy_filter_kind` specifies the matching mode: `regex` or `doublestar`. It is optional; if omitted, Harbor uses `doublestar` matching by default. | ||
| 4. If the repository does not match, Harbor returns `404 Not Found` and does not request content from upstream. |
There was a problem hiding this comment.
| 4. If the repository does not match, Harbor returns `404 Not Found` and does not request content from upstream. | |
| 4. If the normalized repository does not match, Harbor returns `404 Not Found` for the upstream proxy attempt and does not request content from upstream. Already cached local content is not affected by this filter. |
I believe things would be better if this proposal only focuses on preventing new cache fills.
|
|
||
| 1. Existing proxy cache projects without filters keep current behavior (allow all). | ||
| 2. New behavior is opt-in by configuring `proxy_filter_pattern` and `proxy_filter_kind`. | ||
| 3. Clients receive `404 Not Found` for repositories that do not match the configured filter. |
There was a problem hiding this comment.
| 3. Clients receive `404 Not Found` for repositories that do not match the configured filter. | |
| 3. For existing proxy cache projects where a filter is later configured, the filter only affects new upstream proxy requests and cache fills. Artifacts already cached locally remain accessible through existing harbor proxy behaviour. | |
| 4. Clients receive `404 Not Found` for repositories that do not match the configured filter. |
better more transparent wording.

No description provided.