Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,8 @@ For config file examples, environment variable usage, CLI flags, SAM parameter u

## ⚠️ Limitations

* **Group Limit**: The AWS SSO SCIM API has a limit of 50 groups per request. Please support the feature request on the [AWS Support site](https://repost.aws/questions/QUqqnVkIo_SYyF_SlX5LcUjg/aws-sso-scim-api-pagination-for-methods) to help get this limit increased.
* **Throttling**: With a large number of users and groups, you may encounter a `ThrottlingException` from the AWS SSO SCIM API. This project uses the [httpx](https://github.com/slashdevops/httpx) library with automatic retry and jitter backoff to mitigate this, but it's still a possibility.
* **Group Page Size**: The AWS IAM Identity Center SCIM `ListGroups` endpoint returns at most 100 groups per page. Since v0.45.0 this project walks every page via cursor-based pagination, so a larger directory no longer requires manual configuration.
* **Throttling**: With a very large number of users and groups, you may still encounter a `ThrottlingException` from the AWS IAM Identity Center SCIM API. The new member-resolution algorithm (one `members.value` query per user, see [docs/Whats-New.md](docs/Whats-New.md)) is roughly two orders of magnitude lighter than the old brute-force path, but the underlying SCIM endpoint is still rate-limited. This project uses the [httpx](https://github.com/slashdevops/httpx) library with automatic retry and jitter backoff to mitigate this.
* **User Status**: The Google Workspace API doesn't differentiate between normal and guest users except for their status. This project only syncs `ACTIVE` users.

## For `ssosync` Users
Expand Down
32 changes: 32 additions & 0 deletions docs/Whats-New.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,38 @@ This document tracks notable changes, new features, and bug fixes across release

## Unreleased

### SCIM members sync — major security & performance improvement (closes [#520](https://github.com/slashdevops/idp-scim-sync/issues/520))

> [!IMPORTANT]
> This release replaces the brute-force algorithm that reconstructed group memberships on the AWS IAM Identity Center side. The change is **internal-only** (no CLI/config change) but materially improves both the **security posture** and the **runtime cost** of every sync.

#### Security improvements

* **Drastically smaller attack & failure surface per sync.** The number of authenticated SCIM requests issued per sync drops by ~2 orders of magnitude (see *Performance* below). Each request is a credential-bearing call to AWS IAM Identity Center — fewer requests means fewer opportunities for a credential leak, log capture, in-flight tampering, or partial-failure half-state to be observed.
* **Shorter sync window = smaller inconsistency window.** Previously, a sync of a few hundred users could run for many minutes while ~100k requests trickled out behind a hand-rolled 10–150 ms random sleep. During that window, group membership in AWS could be partially reconciled — an externally-observable inconsistent state. The new path completes in a fraction of the time, shrinking that window proportionally.
* **No more time-based throttling band-aid.** The previous `time.Sleep(rand.Intn(...))` jitter existed solely to avoid tripping AWS SCIM throttles under the brute-force call volume. It has been removed: the new call profile is light enough that artificial gapping is no longer required. This eliminates a source of non-determinism in the sync path and removes timing-dependent behavior from a security-sensitive code path.
* **Deterministic pagination.** Cursor-based pagination (`?cursor` + `nextCursor`) walks the full result set deterministically, so memberships can no longer be silently truncated by hitting an undocumented page cap mid-sync.

#### Performance improvements

* **Before:** `internal/scim.GetGroupsMembersBruteForce` issued one `ListGroups` call for *every* (group, user) combination — `O(N_groups × N_users)` requests per sync, throttled by a 10–150 ms random sleep and capped at concurrency 5. For an org with 200 groups and 500 users this is **~100,000 calls per sync run**.
* **Now:** `internal/scim.GetGroupsMembers(ctx, groups, users)` issues one cursor-paginated `?cursor&filter=members.value eq "<user-id>"` request per user (plus one extra request per additional page of memberships, when a single user belongs to more than 100 groups). The result is then inverted into the group → members map the rest of the pipeline expects. For the same 200-group / 500-user org this is **~500 calls per sync — roughly two orders of magnitude fewer requests**.
* **Lower Lambda execution time and cost.** Fewer requests + no random sleeps directly reduces billable Lambda duration on every scheduled invocation.

This is enabled by two AWS IAM Identity Center SCIM features documented at <https://docs.aws.amazon.com/singlesignon/latest/developerguide/listgroups.html>:

* The `members.value eq "<user-id>"` filter, which returns every group containing a given user.
* Cursor-based pagination (`?cursor` + `nextCursor`), which lifts the historical 50-result page cap to 100 results per page and supports walking the full result set deterministically.

**API changes (internal-only — no user-facing CLI/config change):**

* `pkg/aws.SCIMService` gained `ListGroupsWithCursor(ctx, filter, cursor) (*ListGroupsResponse, error)`. `ListGroups` is unchanged and remains the non-paginated single-page call.
* `pkg/aws.ListResponse` gained a `NextCursor string` field.
* `internal/core.SCIMService.GetGroupsMembers` now takes `(ctx, *model.GroupsResult, *model.UsersResult)`. The previous `GetGroupsMembers(ctx, gr)` and `GetGroupsMembersBruteForce(ctx, gr, ur)` methods, plus their AWS-side brute-force scaffolding, have been removed entirely — there is no compatibility shim.
* Memberships pointing at AWS-side groups that are *not* part of the in-scope `gr` (for example AWS-managed groups created outside the sync) are silently ignored, matching prior behavior.

**Tests:** the concurrency cap is now exercised under `testing/synctest` (graduated to the standard library in Go 1.26 — see <https://go.dev/blog/testing-time>) so the test asserts the true peak in-flight count using virtual time, instead of waiting on a wall-clock sleep race.

### IAM least-privilege hardening for the state-file Lambda role

Tightens the Lambda execution role in `template.yaml` so it can only touch the single state object via the single intended path. No behavior change for normal operation; the role is now strictly scoped.
Expand Down
5 changes: 1 addition & 4 deletions internal/core/actions.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,7 @@ func scimSync(
totalUsersResult = model.MergeUsersResult(usersCreated, usersUpdated, usersEqual)

slog.Info("getting SCIM Groups Members")
// unfortunately, the SCIM service does not support the getGroupsMembers method in and efficient way
// see: "Nor Supported" section in: https://docs.aws.amazon.com/singlesignon/latest/developerguide/listgroups.html
// scimGroupsMembersResult, err := scim.GetGroupsMembers(ctx, &totalGroupsResult) // not supported yet
scimGroupsMembersResult, err := scim.GetGroupsMembersBruteForce(ctx, totalGroupsResult, totalUsersResult)
scimGroupsMembersResult, err := scim.GetGroupsMembers(ctx, totalGroupsResult, totalUsersResult)
if err != nil {
return nil, nil, nil, fmt.Errorf("error getting groups members from the SCIM service: %w", err)
}
Expand Down
10 changes: 5 additions & 5 deletions internal/core/scim.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ type SCIMService interface {
// DeleteUsers deletes users in the SCIM Service given a list of users.
DeleteUsers(ctx context.Context, ur *model.UsersResult) error

// GetGroupsMembers get the Groups and their Members from the SCIM service.
GetGroupsMembers(ctx context.Context, gr *model.GroupsResult) (*model.GroupsMembersResult, error)

// GetGroupsMembersBruteForce get the Groups and their Members from the SCIM service using brute force.
GetGroupsMembersBruteForce(ctx context.Context, gr *model.GroupsResult, ur *model.UsersResult) (*model.GroupsMembersResult, error)
// GetGroupsMembers gets the in-scope Groups and their Members from the
// SCIM service. The implementation queries AWS with the members.value
// filter for each user in ur and assigns memberships back to groups in
// gr; memberships pointing at groups outside gr are ignored.
GetGroupsMembers(ctx context.Context, gr *model.GroupsResult, ur *model.UsersResult) (*model.GroupsMembersResult, error)

// CreateGroupsMembers create groups members in the SCIM Service given a list of groups members.
CreateGroupsMembers(ctx context.Context, gmr *model.GroupsMembersResult) (*model.GroupsMembersResult, error)
Expand Down
127 changes: 37 additions & 90 deletions internal/core/sync_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -309,39 +309,15 @@ func TestSyncService_SyncGroupsAndTheirMembers(t *testing.T) {
createUser2ResponseJSONBytes, err := json.Marshal(createUser2Response)
assert.NoError(t, err)

listGroupsResponseGroup1User1 := &aws.ListGroupsResponse{
ListResponse: aws.ListResponse{
StartIndex: 1,
ItemsPerPage: 1,
TotalResults: 1,
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:ListResponse"},
},
Resources: []*aws.Group{
{
ID: "group-1",
Meta: aws.Meta{
ResourceType: "Group",
Created: "2020-01-01T00:00:00Z",
LastModified: "2020-01-01T00:00:00Z",
},
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:Group"},
DisplayName: "group 1",
Members: []*aws.Member{}, // AWS SSO SCIM API don't return members for list groups only TotalResults is returned
},
},
}
listGroupsResponseGroup1User1JSONBytes, err := json.Marshal(listGroupsResponseGroup1User1)
assert.NoError(t, err)

listGroupsResponseGroup1User2 := &aws.ListGroupsResponse{
ListResponse: aws.ListResponse{
StartIndex: 1,
ItemsPerPage: 1,
TotalResults: 0,
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:ListResponse"},
},
Resources: []*aws.Group{
{
// listGroupsByMember returns the AWS ListGroups response for a
// `members.value eq "<user-id>"` cursor query. The test scenario is:
// group-1 contains user-1; group-2 contains user-2.
listGroupsByMember := func(userSCIMID string) []byte {
schemas := []string{"urn:ietf:params:scim:api:messages:2.0:ListResponse"}
groups := []*aws.Group{}
switch userSCIMID {
case "user-1":
groups = append(groups, &aws.Group{
ID: "group-1",
Meta: aws.Meta{
ResourceType: "Group",
Expand All @@ -350,22 +326,10 @@ func TestSyncService_SyncGroupsAndTheirMembers(t *testing.T) {
},
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:Group"},
DisplayName: "group 1",
Members: []*aws.Member{}, // AWS SSO SCIM API don't return members for list groups only TotalResults is returned
},
},
}
listGroupsResponseGroup1User2JSONBytes, err := json.Marshal(listGroupsResponseGroup1User2)
assert.NoError(t, err)

listGroupsResponseGroup2User1 := &aws.ListGroupsResponse{
ListResponse: aws.ListResponse{
StartIndex: 1,
ItemsPerPage: 1,
TotalResults: 0,
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:ListResponse"},
},
Resources: []*aws.Group{
{
Members: []*aws.Member{},
})
case "user-2":
groups = append(groups, &aws.Group{
ID: "group-2",
Meta: aws.Meta{
ResourceType: "Group",
Expand All @@ -374,36 +338,20 @@ func TestSyncService_SyncGroupsAndTheirMembers(t *testing.T) {
},
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:Group"},
DisplayName: "group 2",
Members: []*aws.Member{}, // AWS SSO SCIM API don't return members for list groups only TotalResults is returned
},
},
}
listGroupsResponseGroup2User1JSONBytes, err := json.Marshal(listGroupsResponseGroup2User1)
assert.NoError(t, err)

listGroupsResponseGroup2User2 := &aws.ListGroupsResponse{
ListResponse: aws.ListResponse{
StartIndex: 1,
ItemsPerPage: 1,
TotalResults: 1,
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:ListResponse"},
},
Resources: []*aws.Group{
{
ID: "group-2",
Meta: aws.Meta{
ResourceType: "Group",
Created: "2020-01-01T00:00:00Z",
LastModified: "2020-01-01T00:00:00Z",
},
Schemas: []string{"urn:ietf:params:scim:schemas:core:2.0:Group"},
DisplayName: "group 2",
Members: []*aws.Member{}, // AWS SSO SCIM API don't return members for list groups only TotalResults is returned
Members: []*aws.Member{},
})
}
resp := &aws.ListGroupsResponse{
ListResponse: aws.ListResponse{
ItemsPerPage: len(groups),
Schemas: schemas,
},
},
Resources: groups,
}
body, err := json.Marshal(resp)
assert.NoError(t, err)
return body
}
listGroupsResponseGroup2User2JSONBytes, err := json.Marshal(listGroupsResponseGroup2User2)
assert.NoError(t, err)

// mock Google Workspace API calls
svrIDP := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Expand Down Expand Up @@ -456,21 +404,20 @@ func TestSyncService_SyncGroupsAndTheirMembers(t *testing.T) {
switch r.URL.Path {
case "/Groups":
filter := r.URL.Query().Get("filter")

switch filter {
case "": // first time getting groups
// AWS SCIM cursor-paginated calls include a bare "cursor"
// param; the new GetGroupsMembers algorithm queries one
// `members.value eq "<user-id>"` page per user.
const prefix = `members.value eq "`
if strings.HasPrefix(filter, prefix) && strings.HasSuffix(filter, `"`) {
userSCIMID := filter[len(prefix) : len(filter)-1]
_, _ = w.Write(listGroupsByMember(userSCIMID))
return
}
if filter == "" {
_, _ = w.Write([]byte(`{}`))
case "id eq \"group-1\" and members eq \"user-1\"":
_, _ = w.Write(listGroupsResponseGroup1User1JSONBytes)
case "id eq \"group-1\" and members eq \"user-2\"":
_, _ = w.Write(listGroupsResponseGroup1User2JSONBytes) // user 2 is not in group 1
case "id eq \"group-2\" and members eq \"user-1\"":
_, _ = w.Write(listGroupsResponseGroup2User1JSONBytes) // user 1 is not in group 2
case "id eq \"group-2\" and members eq \"user-2\"":
_, _ = w.Write(listGroupsResponseGroup2User2JSONBytes)
default:
w.WriteHeader(http.StatusBadRequest)
return
}
w.WriteHeader(http.StatusBadRequest)
case "/Users":
_, _ = w.Write([]byte(`{}`))
}
Expand Down
Loading
Loading