Skip to content

Refactor world cluster proxy#120

Draft
Krilliac wants to merge 10 commits intoMangosServer:mainfrom
Krilliac:claude/refactor-world-cluster-proxy-LJNDu
Draft

Refactor world cluster proxy#120
Krilliac wants to merge 10 commits intoMangosServer:mainfrom
Krilliac:claude/refactor-world-cluster-proxy-LJNDu

Conversation

@Krilliac
Copy link
Copy Markdown
Collaborator

@Krilliac Krilliac commented May 4, 2026

This change is Reviewable

claude added 9 commits May 4, 2026 20:18
Removes the merged "all-in-one" GameServer executable. The distributed
WorldCluster + WorldServer pair is now the only supported topology.

- Delete src/server/GameServer/ project and its Dockerfile.
- Drop GameServer entries from Mangos.sln and dev/docker-compose.yml;
  update dev/README.md startup steps.
- Mangos.World.csproj: remove <AssemblyName>WorldServer</AssemblyName>;
  the standalone WorldServer/ exe owns that role.
- RealmServer.csproj: drop unused Mangos.Cluster project reference; no
  source under RealmServer/ uses cluster code, so this was a stray
  layering violation.
- No source-level circular dependencies between Mangos.Cluster and
  Mangos.World remain; both share only Mangos.Cluster.Interop.

First step toward the world-cluster proxy redesign.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
…uckets

Recasts the cluster <-> world IPC so the cluster reads as a packet proxy
rather than a 30+ method RPC server. The C# call surface (ICluster /
IWorld) is unchanged so all existing call sites in Mangos.World keep
compiling; only the underlying wire format and dispatcher organization
change.

Three explicit buckets, mirrored across InteropMethodId, ICluster,
IWorld, and both dispatchers/proxies:

  1. Relay (0x0200-0x020F): per-client lifecycle plus the WoW packet
     hot path. Cluster decrypts inbound and forwards as one-way envelopes;
     PacketOut flows back the same way.

  2. Directives (0x0210-0x021F): fire-and-forget actions a world asks
     the cluster to perform on its behalf (drop, transfer, update,
     chat-flag, broadcasts, group-fanout).

  3. Control RPC (0x0220-0x023F): request/response for things that
     can't be fire-and-forget - registration, instance lifecycle,
     character creation, group/guild/battlefield orchestration.

Old ClusterConnect/WorldClientPacket-style names are gone from the
wire; ICluster.Connect now ships as ControlWorldHello, ICluster.ClientSend
as PacketOut, IWorld.ClientPacket as PacketIn, etc. The 0x0300+ range is
deliberately reserved for the federation channel coming in PR MangosServer#4.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
…xit codes

PR MangosServer#3 in the world-cluster-proxy series. The cluster gains a process
supervisor; worlds gain documented exit codes and a real autonomous mode.

What landed:

* ExitCodes (Mangos.Cluster.Interop/ExitCodes.cs): 0 clean, 2 config,
  3 db-version, 10 restart-requested, 11 stop-requested, 20 fatal,
  30 orphaned. Used by both processes; the supervisor reads them to
  decide respawn behaviour.

* SupervisorConfiguration (Mangos.Configuration): per-world entries
  with Mode = Internal | External so cluster either forks the process
  itself (Internal) or just tracks state (External, for systemd /
  docker / k8s). Heartbeat interval, stale/dead thresholds, and
  exponential respawn backoff are all configurable. Disabled by
  default so existing single-process setups are unchanged.

* WorldSupervisor + SupervisedWorld (Mangos.Cluster/Supervision/):
  reconcile loop spawns Idle worlds, heartbeats the live ones via
  IWorld.Ping/GetServerInfo, marks Stale/Dead on missed beats,
  drains-then-kills on stop/restart. Cross-platform: System.Diagnostics
  .Process for both Linux and Windows; Process.Kill(entireProcessTree)
  for hard kills. PickLeastLoaded(mapId) does load-aware placement
  using the new ServerInfo fields - this is what will back .instance
  spawn in PR MangosServer#5.

* ServerInfo extended with PlayerCount, InstanceCount,
  BattlegroundCount, UptimeMs. WS_Network.WorldServerClass.GetServerInfo
  now populates them so heartbeats carry real load data day one.

* Cluster wiring: WorldServerClass.Connect now calls supervisor
  .OnWorldHello(uri, world, maps); Disconnect calls OnWorldGoodbye(uri).
  WorldCluster/Program.cs starts the supervisor before the IPC server
  and registers ProcessExit / Ctrl-C handlers.

* WorldServer/Program.cs now has a 60s grace window when the cluster
  is unreachable, after which it exits Orphaned (30) so the supervisor
  can respawn it on cluster recovery. Console Ctrl-C exits Clean (0);
  fatal startup failures exit FatalCrash (20).

No source-level cycles; Mangos.Cluster -> Mangos.Configuration only,
Mangos.Configuration -> Mangos.Common only. Worlds and clusters can
each survive without the other and supervise each other on restart.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
PR MangosServer#4 in the world-cluster-proxy series. Introduces the federation bus
that PRs MangosServer#5 and MangosServer#6 will reuse for in-game GM commands and cross-realm
chat/groups respectively.

Highlights:

* New project Mangos.Cluster.Admin holds the shared protocol so the
  in-game chat handler, the cluster console, and (future) external CLI
  all speak the same wire format. Lives at the same layer as
  Mangos.Cluster.Interop; both are referenced by Mangos.Cluster.

* Federation transport: FederationLink wraps an InteropConnection but
  uses its own 0x0300+ method-id range (PeerHello/Ack, Heartbeat,
  AdminCommand). FederationServer accepts inbound dials, verifies a
  HMAC-SHA256 PeerHello using a per-peer shared secret from config,
  and tracks authenticated links keyed by remote cluster id.

* Admin command schema: AdminCommand carries verb +
  TargetRealmId/WorldId/InstanceId/MapId/GraceSeconds/Extras.
  AdminCommandReply is status + lines. Same struct used by all
  three entry points (in-game chat, cluster console, external CLI).
  AdminCommandParser handles the unified ".server / .instance /
  .realm" syntax including --realm cross-cluster routing flags.

* ClusterAdminCommandHandler implements IAdminCommandHandler against
  the supervisor: server list/info/start/stop/restart, instance list,
  realm list. The remaining verbs (instance spawn/restart/etc.) hook
  in once PR MangosServer#5 wires the in-game side that calls them.

* ConsoleAdminRepl runs in the background on cluster startup, reading
  stdin and dispatching through the same handler used by the
  federation transport.

* SQL migration sql/Updates/Accounts/Rel21_02_002.sql adds the
  realmlist columns (clusterId, clusterAdminEndpoint, displayTag,
  markerPosition), the per-account federation_show_markers toggle,
  and the federation_group / federation_group_member tables for
  PR MangosServer#6's cross-realm group state. Uses ADD COLUMN IF NOT EXISTS so
  it's idempotent. Bumps db_version to (21,2,2);
  MangosGlobalConstants.RevisionDbRealmContent goes 1 -> 2.

* FederationConfiguration with Enabled/LocalClusterId/LocalDisplayTag/
  ListenAddress/ListenPort/MarkerMode/Peers wired into
  MangosConfiguration. Sample blocks added to dev/configuration.json,
  dev/configuration.distributed.json, and the in-tree sample. All
  ship Enabled=false so existing single-cluster setups are unchanged.

Federation peers are authenticated by symmetric HMAC over
(clusterId || nonce); rotate the secret to evict. Listener and
admin transport share the framed-TCP InteropConnection format - one
transport everywhere, separate listener port (default 50101 vs the
world IPC's 50001).

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
PR MangosServer#5 in the world-cluster-proxy series. Adds the in-game entry point
for the admin command surface introduced in PR MangosServer#4 so a GM can list
worlds, restart instances, and (with --realm) target a peer cluster
without leaving the client.

Wire-level: ICluster gains a single new control RPC,
RunAdminCommand(byte[]) -> byte[]. The world serializes an AdminCommand,
calls the cluster, and the cluster's WorldServerClass forwards to
the locally-registered IAdminCommandHandler. Same handler used by the
console REPL and inbound federation calls, so all three entry points
share one execution path.

In-game: new partial class WS_Commands.Admin.cs registers three
[ChatCommand] entries - .server, .instance, .realm - all gated at
AccessLevel.Admin. Each forwards the raw arguments through the shared
AdminCommandParser. Reply lines are echoed back through
character.CommandResponse so the GM sees a status header followed by
the individual reply lines.

Cross-realm: passing --realm N on any of the commands sets
AdminCommand.TargetRealmId; PR MangosServer#4's ClusterAdminCommandHandler will
route to the peer once the federation table lookup lands (planned in
PR MangosServer#6 since it ships alongside chat federation that needs the same
table). Local-cluster commands work today.

New project reference: Mangos.World -> Mangos.Cluster.Admin. No source
cycles introduced; Mangos.Cluster.Admin only references
Mangos.Cluster.Interop (the contracts project) so the layering stays
flat.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
PR MangosServer#6 in the world-cluster-proxy series. Lays the federation Phase A
groundwork: whisper/party/raid envelopes, group invite/roster
replication envelopes, presence queries, the cluster-side router,
and the in-game .realm marker preference toggle.

Wire format (federation transport, AdminMethodId 0x0320-0x037F):
* ChatEnvelope: SenderRealmId/Tag, SenderGuid/Name, Channel
  (whisper/party/raid/guild/etc), recipient name+guid, group id,
  language, body. Marker tag set by sender; receiving cluster
  applies its local marker policy.
* GroupInviteEnvelope / GroupInviteResponseEnvelope /
  GroupRosterUpdateEnvelope: shape the leader-cluster -> peer-cluster
  flow for forming and replicating cross-realm groups. Roster updates
  are authoritative snapshots; receivers overwrite local state.
* PresenceQuery / PresenceReply: lookup-by-name across realms, used
  for whisper resolution and invite addressing.

Cluster side:
* FederationRouter (Mangos.Cluster/Federation/) maintains dial-out
  links by clusterId, multiplexes envelopes, and lazily reconnects.
* RealmMarkers.Decorate centralises tag rendering with
  FederationMarkerMode (Always / ClientPreference / Off) and
  whisper-always-shows-tag carve-out so reply targeting still works
  even when markers are disabled.
* WC_Handlers_Chat.TryRouteFederatedWhisper intercepts a
  "Name-RealmTag" form when no local character matches, builds a
  ChatEnvelope, and ships it via the router. Best-effort delivery;
  unreachable peers fall through to SMSG_CHAT_PLAYER_NOT_FOUND so
  the client experience matches today's behaviour.
* FederationServer accepts inbound peer dials and now exposes
  OnLinkAccepted so the router can attach handlers per link.
* WorldCluster/Program.cs wires the router into both inbound
  (FederationServer.OnLinkAccepted) and outbound paths.

World side:
* ICluster gains RouteFederatedChat / RouteFederatedGroupInvite
  control RPCs (0x0241 / 0x0242) so the world can hand the cluster
  an envelope without learning federation internals.
* WS_Commands.Admin.cs's .realm command now accepts show/hide at
  Player access (own-marker preference) and list/peers at Admin
  level. The cluster-side handler stubs the per-account DB write -
  schema landed in PR MangosServer#4 (account.federation_show_markers); wiring
  the actual UPDATE through the calling player's account row is a
  follow-up since AdminCommand doesn't carry an invoker today.

Cross-realm admin: AdminCommand.TargetRealmId != 0 routes through
FederationRouter.GetOrOpenAsync (PR MangosServer#5 left this unrouted; PR MangosServer#6
plumbs it). Endpoint resolution (realmId -> host:port) is stubbed at
DI registration; the realmlist DB lookup lands as a follow-up that
doesn't touch this PR's wire format.

What's deliberately not in this PR:
* Group accept popup integration: the GroupInviteEnvelope arrives at
  the recipient cluster but PR MangosServer#6 doesn't yet pop a client-side
  dialog. The envelope shape is final so the gameplay-side bind is
  drop-in.
* Phase B shard co-location (the heavy "make grouped players share
  an instance shard" path). The Shard envelopes carry method-ids
  0x0360+ but no consumers yet.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
…nvite popup, Phase B shard scaffold

Closes the punch list of items previously documented as deferred in
PR MangosServer#5/MangosServer#6 commit messages. Each piece is small enough to land on its
own; bundled here because they share the same DI graph and config.

* Local realm id provider now reads cfg.Federation.LocalClusterId
  instead of hard-coded 0, so .realm list / cross-realm dispatch sees
  the right id.

* Mangos.MySql gains two new commands (IGetFederationPeersQuery and
  IUpdateFederationMarkerCommand) plus their .sql resources. Queries
  follow the existing Dapper pattern (UpdateAccountCommand etc.).

* FederationRouter now optionally takes IGetFederationPeersQuery,
  starts a 60-second refresh loop in StartAsync(), and consults its
  PeerInfo cache before falling back to the caller-supplied lambda.
  GetOrOpenAsync resolves clusterId -> endpoint from realmlist.
  WorldCluster/Program.cs awaits federationRouter.StartAsync().

* WS_Commands.Admin.cs's `.realm show / .realm hide` now flips
  account.federation_show_markers via the new MySql command,
  resolving the calling player's account through character.client
  .Account. Persists across logins. ClusterAdminCommandHandler's
  RealmMarker verb retains the cross-realm flow for operators
  flipping a remote account via --account <name>.

* GetServerInfo now reports BATTLEGROUNDs.Count instead of zero so
  the supervisor's load-aware placement sees real BG load.

* Federated group invites: FederatedGroupInviter binds onto
  FederationRouter.OnGroupInvite, finds the local recipient, and
  sends SMSG_GROUP_INVITE so the standard popup appears with the
  leader's name prefixed by their realm tag. Replies to the leader's
  cluster with a GroupInviteResponseEnvelope. Outbound side: the
  existing CMSG_GROUP_INVITE handler now detects "Name-RealmTag"
  patterns (numeric clusterId or textual displayTag) and forwards
  via the router. Same Name-RealmTag tag-name resolution applied to
  the whisper path.

* Phase B shard scaffold: ShardClaim / ShardRelease envelopes
  (AdminMethodId 0x0360/0x0361), FederationLink dispatch, and an
  in-memory ShardRegistry that tracks (mapId, shardKey) -> owner
  cluster + relay endpoint. Bound at cluster startup. The world's
  enter-zone path will consult ShardRegistry.GetShard in a follow-up
  PR; this commit lays the wire format and registry so that follow-up
  is a one-file add.

Cycle audit: Mangos.Cluster -> Mangos.Cluster.Admin and
Mangos.MySql; both already referenced. Mangos.World -> Mangos.MySql
already referenced. No new cycles.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
… invite reply, peer maintenance

Closes the last loose ends from the world-cluster proxy series. Build
verified locally: dotnet build of Mangos.sln passes with 0 errors and
all 15 existing tests still green; the 106 remaining warnings are all
pre-existing legacy code (deprecated SQL.Query and unused fields).

What changed:

* ClusterAdminCommandHandler: implements every previously-stubbed
  AdminVerb. ServerClaimMaps asks a world to host a comma-separated
  --maps list. InstanceSpawn picks the least-loaded eligible world via
  the supervisor's load heuristic. InstanceShutdown / InstanceRestart
  fan out to every world that claims the map. InstanceInfo reports
  hosting world plus any active shard claim. InstanceKick is a hint
  alias for shutdown. RealmList now lists peers (clusterId, tag,
  endpoint) when federation is on. RealmPeers shows live authenticated
  links from the router's Peers map. RealmMarker accepts --account
  for cross-realm operator overrides and points self-toggle users at
  the in-game .realm show / .realm hide.

* FederatedChatDeliverer wires onto FederationRouter.OnChat and
  delivers every inbound envelope as a standard SMSG_MESSAGECHAT to
  the local recipient(s). Whisper -> single recipient by name; Party
  / Raid -> all local members of the federated group; System ->
  broadcast. Marker rendering goes through RealmMarkers.Decorate so
  whispers always carry the [tag] regardless of preference.

* Federated invite accept/decline now closes the loop. Inviter
  remembers each pending invite by recipient guid; the cluster's
  CMSG_GROUP_ACCEPT and CMSG_GROUP_DECLINE handlers consult
  TryHandleAccept/TryHandleDecline first and short-circuit to a
  GroupInviteResponseEnvelope back to the leader's cluster, instead
  of crashing on the missing local Group object.

* FederationRouter.StartAsync now spins both a peer-table refresh
  loop (60s realmlist DB poll) and a maintenance loop (15s) that
  proactively dials every known peer and heartbeats live links via
  the new FederationLink.HeartbeatAsync. Failed heartbeats drop the
  link so the next iteration redials. Routes around peer outages
  without operator intervention.

* WorldCluster/Program.cs binds the new chat deliverer alongside the
  existing inviter and shard registry hookups.

No new project references; layering unchanged.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
Closes the last open item from the world-cluster proxy series. The
shard registry that arrived in PR MangosServer#6 is now actually consulted when a
player logs in, and the leader's cluster actually emits the claims
that populate peer registries. Build verified: 0 errors, 15/15 tests
pass.

End-to-end flow once a federated group exists:

1. Leader on Cluster A invites a player on Cluster B; B accepts.
2. Cluster B's FederatedGroupInviter sends a
   GroupInviteResponseEnvelope with TargetRealmId = B's clusterId
   (previously left at 0).
3. Cluster A's new FederatedShardClaimer receives the response,
   looks up the leader by groupId, and emits a ShardClaimEnvelope
   keyed by (leader.Map, groupId) back to Cluster B.
4. Cluster B's ShardRegistry records the claim.
5. The Cluster-B player tries to log in to a character on that map.
   World's ClientLogin now asks ICluster.QueryShard(mapId, guid).
6. Cluster B's QueryShard resolves guid -> group -> shardKey, finds
   the registry entry, sees OwnerClusterId != local, returns
   ShardLookupResult { Kind = Foreign, OwnerClusterId, OwnerEndpoint,
   OwnerDisplayTag }.
7. World logs the result, sends a SMSG_MESSAGECHAT system message to
   the client telling them which realm hosts their group's instance,
   and calls cluster.ClientDrop so they reconnect via the standard
   realmlist flow on the host realm.

Wire format: new InteropMethodId.ControlQueryShard (0x0243) with a
ShardLookupResult payload (Kind + OwnerClusterId + OwnerEndpoint +
OwnerDisplayTag). ICluster gains QueryShard; dispatcher and proxy
extended; serializer gains Read/WriteShardLookupResult.

WorldServerClass on the cluster side now takes a ShardRegistry and a
local-cluster-id provider via DI. QueryShard walks WorldCluster
.CharacteRs to resolve guid -> group, uses Group.Id as the shard
key, looks up ShardRegistry.GetShard(mapId, shardKey), and compares
OwnerClusterId against the local id from FederationConfiguration.

Layering note: no new project references. Mangos.Cluster.Interop ->
Mangos.Common only; Mangos.Cluster -> Mangos.Cluster.Admin already.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
@Krilliac Krilliac marked this pull request as draft May 4, 2026 22:04
The CI pipeline runs `dotnet format --verify-no-changes` after build
and test; it caught style drift my new files introduced (auto-property
brace style, import ordering). This commit is the result of running
`dotnet format` on the solution; only whitespace and import ordering
change. No semantic changes, no behaviour change, build still clean
(0 errors), tests still 15/15 pass, and `dotnet format
--verify-no-changes` now exits 0.

Two pre-existing files in Mangos.MySql/Connections/ and
Mangos.Tests/Logging/ also had stale formatting; the formatter
normalised those too.

https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants