Proposal: Custom Project Roles#276
Conversation
Signed-off-by: Max <max.grau.stenzel@gmail.com>
correction Signed-off-by: Max <max.grau.stenzel@gmail.com>
| - **Minimal schema changes:** Only extend `role` table with metadata columns (`is_builtin`, `description`, `modified`, `created_by`, `modified_by`, timestamps) | ||
| - **Discriminator pattern:** `role_permission.role_type` distinguishes 'project-role' (users/groups) from 'robotaccount' (direct permissions) | ||
| - **System admin only:** Only system administrators can create/modify custom roles (project admins assign roles, existing workflow unchanged) | ||
| - **Built-in role protection:** Built-in roles can be modified but not deleted; modifications are tracked and reversible |
There was a problem hiding this comment.
I don't think the build-in role should be "modifiable", allowing modification of the standard Project Admin, Maintainer, Developer, Guest, and Limited Guest roles introduces huge security risks. If a system administrator makes a change to projectAdmin or developer, it immediately alters the baseline security assumptions of all existing projects.
We should keep all the existing define as is, but introduce the custom role only.
There was a problem hiding this comment.
Yes, as discussed in the community meeting I fully accept your point. I will change this.
| - **Minimal schema changes:** Only extend `role` table with metadata columns (`is_builtin`, `description`, `modified`, `created_by`, `modified_by`, timestamps) | ||
| - **Discriminator pattern:** `role_permission.role_type` distinguishes 'project-role' (users/groups) from 'robotaccount' (direct permissions) | ||
| - **System admin only:** Only system administrators can create/modify custom roles (project admins assign roles, existing workflow unchanged) | ||
| - **Built-in role protection:** Built-in roles can be modified but not deleted; modifications are tracked and reversible |
There was a problem hiding this comment.
Built-in roles must remain completely immutable to serve as a secure baseline.
|
|
||
| ✅ **Migration strategy:** | ||
| - Zero-downtime migration | ||
| - Built-in role permissions migrated from `rbac_role.go` to `role_permission` table |
There was a problem hiding this comment.
Migrating rbac_role.go's static policies into role_permission is cleaner, but we must protect against:
- High DB query volume during login/CLI handshakes.
- Missing indexes on
role_permissioncombined withrole_typeand associated keys.
| 2. **API:** Add role CRUD endpoints for system administrators | ||
| 3. **UI:** Add role management interface in System Administration section | ||
| 4. **Security:** Implement privilege escalation prevention and audit logging | ||
| 5. **Caching:** Load permissions at login (session-scoped cache) |
There was a problem hiding this comment.
Sorry for the back and forth.
The proposal suggests session-scoped caching at login, with updates applying on the next login. While this is simple, it poses a severe security risk in enterprise, high-availability, or multi-replica Harbor setups:
- If a custom role's permission is revoked (e.g., a "Security Auditor" role has access to artifacts revoked), the user can continue executing actions because their session-scoped memory cache on their specific pod is stale.
- Logouts or token expirations are not immediate enough for compliance-critical revocations.
There was a problem hiding this comment.
For my understanding the same caching mechanism is used for the role assignement itself. The use case that somebody gets a critical role revoked while logged in is probably more likely and is managed in the same way.
There was a problem hiding this comment.
Yeah, I get where you're coming from about role assignments having some cache lag today, but there’s a big difference here.
Right now, standard roles are hardcoded as compile-time constants in rbac_role.go. Since they’re locked in Go memory, they can never change while the server is running. But by moving these to the database for custom roles, we're introducing a fully dynamic state. We can’t really apply the same static caching assumptions to dynamic DB records, especially when a single role change can instantly impact thousands of users across multiple projects.
Since we want to ship this safely in v2.16.0, we should make sure changes to custom roles propagate across all Harbor instances without a massive lag.Instead of building a super complex sync or tracking lock system, maybe we can go with a lightweight compromise:
- Whenever an admin edits or deletes a custom role, we write a simple version tag or timestamp in Redis (like custom_roles:last_update).
- When evaluating permissions during a session, the evaluator does a quick check against that Redis timestamp. If the local cache is out of date, it lazily reloads the role definitions.
How does that sound to you?
There was a problem hiding this comment.
I would not use Redis here. This add another complexity dimension. A KISS approach would be if a custom role changes, we can invalidate all sessions, forcing users to log in again. Custom role change is something that will happen once or twice during the entire lifecycle of a custom_role. Once a role is defined, it is unlikely to ever change again. People (admins) who are capable of changing roles will not do it since they are unable to predict consequences for the users since they are disconnected from the user base.
There was a problem hiding this comment.
maybe not invalidate all sessions, but the sessions that associate with the updated role.
There was a problem hiding this comment.
Sorry for the confusion earlier — let me restate my point more clearly.
In current Harbor, when you assign, remove, or change a logged-in user's (or group's) project role, the change is not reflected immediately. It's evaluated at the next login and stored in the session scope — exactly the same mechanism the role feature uses.
I'd argue this existing behavior is actually more security-critical than custom role changes, because role assignment is a routine, day-to-day operation for Admins and ProjectAdmins. For example: if a user who is no longer trusted is removed from a project, that removal is also not reflected until their session refreshes. So the stale-cache window already exists today for the most common revocation case.
Given that, applying the same session mechanism to custom role modifications should be sufficient and consistent with current behavior — it introduces no new class of risk beyond what Harbor already accepts for role assignment.
If a mechanism to invalidate session information (enabling live propagation of administrative changes) is introduced in the future, the role feature can easily be updated to support it. Ideally that mechanism would cover both role assignment and role modification together, since they share the same underlying caching.
There was a problem hiding this comment.
@maxgraustenzel-create Hi, could you please clarify the Harbor scenario you mentioned earlier? I just tested a specific use case with two users: admin and test.
First, admin granted test permissions for Project A and Project B. Once test logged in, both projects were visible. Then, admin revoked these permissions and removed test from both projects. As a result, test immediately lost access to Projects A and B without needing to re-log in.
| **Owner:** Max Graustenzel | ||
| **Timeline:** Completed | ||
|
|
||
| ### Phase 4: Security Validation (🔄 In Progress - 30%) |
There was a problem hiding this comment.
do we consider?
- If a custom role includes
ResourceRobot:ActionCreate, how do we prevent a user with this role from creating a robot account with permissions that exceed the user's custom role scope? - A user with role-management permissions within a project must not be allowed to assign custom roles that contain permissions higher than their own assigned role.
There was a problem hiding this comment.
I have implemented mechanisms that prevent privilege escalation (on frontend and backend).
- Users can only create robots with less or equal permissions than they have. Where necessary i implemented a mapping between robot and role permissions.
- Users can only assign roles with less or equal permissions than they have.
|
|
||
| **Performance Impact:** | ||
|
|
||
| - **Login:** +50-100ms (one-time permission load) |
There was a problem hiding this comment.
We need to run a proper stress test with a sustained load of 500 requests, not just a single benchmark. We need accurate cost metrics, and our commercial deployments handle massive loads. Also, can you grab the CPU/Memory trends for the DB pod? I'm worried the current resource limits won't be enough after we bump Harbor to v2.16.
This data is essential for my evaluation of the solution.
There was a problem hiding this comment.
Can you please provide additional information about the required stresstest ? Is there already a sort of scenario to evaluate login performance that I shall reuse?
There was a problem hiding this comment.
We can use the https://github.com/goharbor/perf to do the stress test. cc @chlins can you help?
There was a problem hiding this comment.
@maxgraustenzel-create Yes, you can use https://github.com/goharbor/perf to benchmark some harbor API, please refer to the README for usage, and feel free to reach out me if you have any question.
Summary
Proposal to add custom project roles to Harbor, enabling system administrators to create roles with flexible permission combinations.
Related Issues
Implementation
origin/18124-custom-role-featureDiscussion
This proposal is ready for community review and discussion at the next community meeting.
Community Meeting
If wished, I could present this proposal and a demonstration of the role functionality at the next community meeting for feedback and discussion.
/kind proposal