Refactor world cluster proxy#120
Draft
Krilliac wants to merge 10 commits intoMangosServer:mainfrom
Draft
Conversation
Removes the merged "all-in-one" GameServer executable. The distributed WorldCluster + WorldServer pair is now the only supported topology. - Delete src/server/GameServer/ project and its Dockerfile. - Drop GameServer entries from Mangos.sln and dev/docker-compose.yml; update dev/README.md startup steps. - Mangos.World.csproj: remove <AssemblyName>WorldServer</AssemblyName>; the standalone WorldServer/ exe owns that role. - RealmServer.csproj: drop unused Mangos.Cluster project reference; no source under RealmServer/ uses cluster code, so this was a stray layering violation. - No source-level circular dependencies between Mangos.Cluster and Mangos.World remain; both share only Mangos.Cluster.Interop. First step toward the world-cluster proxy redesign. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
…uckets
Recasts the cluster <-> world IPC so the cluster reads as a packet proxy
rather than a 30+ method RPC server. The C# call surface (ICluster /
IWorld) is unchanged so all existing call sites in Mangos.World keep
compiling; only the underlying wire format and dispatcher organization
change.
Three explicit buckets, mirrored across InteropMethodId, ICluster,
IWorld, and both dispatchers/proxies:
1. Relay (0x0200-0x020F): per-client lifecycle plus the WoW packet
hot path. Cluster decrypts inbound and forwards as one-way envelopes;
PacketOut flows back the same way.
2. Directives (0x0210-0x021F): fire-and-forget actions a world asks
the cluster to perform on its behalf (drop, transfer, update,
chat-flag, broadcasts, group-fanout).
3. Control RPC (0x0220-0x023F): request/response for things that
can't be fire-and-forget - registration, instance lifecycle,
character creation, group/guild/battlefield orchestration.
Old ClusterConnect/WorldClientPacket-style names are gone from the
wire; ICluster.Connect now ships as ControlWorldHello, ICluster.ClientSend
as PacketOut, IWorld.ClientPacket as PacketIn, etc. The 0x0300+ range is
deliberately reserved for the federation channel coming in PR MangosServer#4.
https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
…xit codes PR MangosServer#3 in the world-cluster-proxy series. The cluster gains a process supervisor; worlds gain documented exit codes and a real autonomous mode. What landed: * ExitCodes (Mangos.Cluster.Interop/ExitCodes.cs): 0 clean, 2 config, 3 db-version, 10 restart-requested, 11 stop-requested, 20 fatal, 30 orphaned. Used by both processes; the supervisor reads them to decide respawn behaviour. * SupervisorConfiguration (Mangos.Configuration): per-world entries with Mode = Internal | External so cluster either forks the process itself (Internal) or just tracks state (External, for systemd / docker / k8s). Heartbeat interval, stale/dead thresholds, and exponential respawn backoff are all configurable. Disabled by default so existing single-process setups are unchanged. * WorldSupervisor + SupervisedWorld (Mangos.Cluster/Supervision/): reconcile loop spawns Idle worlds, heartbeats the live ones via IWorld.Ping/GetServerInfo, marks Stale/Dead on missed beats, drains-then-kills on stop/restart. Cross-platform: System.Diagnostics .Process for both Linux and Windows; Process.Kill(entireProcessTree) for hard kills. PickLeastLoaded(mapId) does load-aware placement using the new ServerInfo fields - this is what will back .instance spawn in PR MangosServer#5. * ServerInfo extended with PlayerCount, InstanceCount, BattlegroundCount, UptimeMs. WS_Network.WorldServerClass.GetServerInfo now populates them so heartbeats carry real load data day one. * Cluster wiring: WorldServerClass.Connect now calls supervisor .OnWorldHello(uri, world, maps); Disconnect calls OnWorldGoodbye(uri). WorldCluster/Program.cs starts the supervisor before the IPC server and registers ProcessExit / Ctrl-C handlers. * WorldServer/Program.cs now has a 60s grace window when the cluster is unreachable, after which it exits Orphaned (30) so the supervisor can respawn it on cluster recovery. Console Ctrl-C exits Clean (0); fatal startup failures exit FatalCrash (20). No source-level cycles; Mangos.Cluster -> Mangos.Configuration only, Mangos.Configuration -> Mangos.Common only. Worlds and clusters can each survive without the other and supervise each other on restart. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
PR MangosServer#4 in the world-cluster-proxy series. Introduces the federation bus that PRs MangosServer#5 and MangosServer#6 will reuse for in-game GM commands and cross-realm chat/groups respectively. Highlights: * New project Mangos.Cluster.Admin holds the shared protocol so the in-game chat handler, the cluster console, and (future) external CLI all speak the same wire format. Lives at the same layer as Mangos.Cluster.Interop; both are referenced by Mangos.Cluster. * Federation transport: FederationLink wraps an InteropConnection but uses its own 0x0300+ method-id range (PeerHello/Ack, Heartbeat, AdminCommand). FederationServer accepts inbound dials, verifies a HMAC-SHA256 PeerHello using a per-peer shared secret from config, and tracks authenticated links keyed by remote cluster id. * Admin command schema: AdminCommand carries verb + TargetRealmId/WorldId/InstanceId/MapId/GraceSeconds/Extras. AdminCommandReply is status + lines. Same struct used by all three entry points (in-game chat, cluster console, external CLI). AdminCommandParser handles the unified ".server / .instance / .realm" syntax including --realm cross-cluster routing flags. * ClusterAdminCommandHandler implements IAdminCommandHandler against the supervisor: server list/info/start/stop/restart, instance list, realm list. The remaining verbs (instance spawn/restart/etc.) hook in once PR MangosServer#5 wires the in-game side that calls them. * ConsoleAdminRepl runs in the background on cluster startup, reading stdin and dispatching through the same handler used by the federation transport. * SQL migration sql/Updates/Accounts/Rel21_02_002.sql adds the realmlist columns (clusterId, clusterAdminEndpoint, displayTag, markerPosition), the per-account federation_show_markers toggle, and the federation_group / federation_group_member tables for PR MangosServer#6's cross-realm group state. Uses ADD COLUMN IF NOT EXISTS so it's idempotent. Bumps db_version to (21,2,2); MangosGlobalConstants.RevisionDbRealmContent goes 1 -> 2. * FederationConfiguration with Enabled/LocalClusterId/LocalDisplayTag/ ListenAddress/ListenPort/MarkerMode/Peers wired into MangosConfiguration. Sample blocks added to dev/configuration.json, dev/configuration.distributed.json, and the in-tree sample. All ship Enabled=false so existing single-cluster setups are unchanged. Federation peers are authenticated by symmetric HMAC over (clusterId || nonce); rotate the secret to evict. Listener and admin transport share the framed-TCP InteropConnection format - one transport everywhere, separate listener port (default 50101 vs the world IPC's 50001). https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
PR MangosServer#5 in the world-cluster-proxy series. Adds the in-game entry point for the admin command surface introduced in PR MangosServer#4 so a GM can list worlds, restart instances, and (with --realm) target a peer cluster without leaving the client. Wire-level: ICluster gains a single new control RPC, RunAdminCommand(byte[]) -> byte[]. The world serializes an AdminCommand, calls the cluster, and the cluster's WorldServerClass forwards to the locally-registered IAdminCommandHandler. Same handler used by the console REPL and inbound federation calls, so all three entry points share one execution path. In-game: new partial class WS_Commands.Admin.cs registers three [ChatCommand] entries - .server, .instance, .realm - all gated at AccessLevel.Admin. Each forwards the raw arguments through the shared AdminCommandParser. Reply lines are echoed back through character.CommandResponse so the GM sees a status header followed by the individual reply lines. Cross-realm: passing --realm N on any of the commands sets AdminCommand.TargetRealmId; PR MangosServer#4's ClusterAdminCommandHandler will route to the peer once the federation table lookup lands (planned in PR MangosServer#6 since it ships alongside chat federation that needs the same table). Local-cluster commands work today. New project reference: Mangos.World -> Mangos.Cluster.Admin. No source cycles introduced; Mangos.Cluster.Admin only references Mangos.Cluster.Interop (the contracts project) so the layering stays flat. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
PR MangosServer#6 in the world-cluster-proxy series. Lays the federation Phase A groundwork: whisper/party/raid envelopes, group invite/roster replication envelopes, presence queries, the cluster-side router, and the in-game .realm marker preference toggle. Wire format (federation transport, AdminMethodId 0x0320-0x037F): * ChatEnvelope: SenderRealmId/Tag, SenderGuid/Name, Channel (whisper/party/raid/guild/etc), recipient name+guid, group id, language, body. Marker tag set by sender; receiving cluster applies its local marker policy. * GroupInviteEnvelope / GroupInviteResponseEnvelope / GroupRosterUpdateEnvelope: shape the leader-cluster -> peer-cluster flow for forming and replicating cross-realm groups. Roster updates are authoritative snapshots; receivers overwrite local state. * PresenceQuery / PresenceReply: lookup-by-name across realms, used for whisper resolution and invite addressing. Cluster side: * FederationRouter (Mangos.Cluster/Federation/) maintains dial-out links by clusterId, multiplexes envelopes, and lazily reconnects. * RealmMarkers.Decorate centralises tag rendering with FederationMarkerMode (Always / ClientPreference / Off) and whisper-always-shows-tag carve-out so reply targeting still works even when markers are disabled. * WC_Handlers_Chat.TryRouteFederatedWhisper intercepts a "Name-RealmTag" form when no local character matches, builds a ChatEnvelope, and ships it via the router. Best-effort delivery; unreachable peers fall through to SMSG_CHAT_PLAYER_NOT_FOUND so the client experience matches today's behaviour. * FederationServer accepts inbound peer dials and now exposes OnLinkAccepted so the router can attach handlers per link. * WorldCluster/Program.cs wires the router into both inbound (FederationServer.OnLinkAccepted) and outbound paths. World side: * ICluster gains RouteFederatedChat / RouteFederatedGroupInvite control RPCs (0x0241 / 0x0242) so the world can hand the cluster an envelope without learning federation internals. * WS_Commands.Admin.cs's .realm command now accepts show/hide at Player access (own-marker preference) and list/peers at Admin level. The cluster-side handler stubs the per-account DB write - schema landed in PR MangosServer#4 (account.federation_show_markers); wiring the actual UPDATE through the calling player's account row is a follow-up since AdminCommand doesn't carry an invoker today. Cross-realm admin: AdminCommand.TargetRealmId != 0 routes through FederationRouter.GetOrOpenAsync (PR MangosServer#5 left this unrouted; PR MangosServer#6 plumbs it). Endpoint resolution (realmId -> host:port) is stubbed at DI registration; the realmlist DB lookup lands as a follow-up that doesn't touch this PR's wire format. What's deliberately not in this PR: * Group accept popup integration: the GroupInviteEnvelope arrives at the recipient cluster but PR MangosServer#6 doesn't yet pop a client-side dialog. The envelope shape is final so the gameplay-side bind is drop-in. * Phase B shard co-location (the heavy "make grouped players share an instance shard" path). The Shard envelopes carry method-ids 0x0360+ but no consumers yet. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
…nvite popup, Phase B shard scaffold Closes the punch list of items previously documented as deferred in PR MangosServer#5/MangosServer#6 commit messages. Each piece is small enough to land on its own; bundled here because they share the same DI graph and config. * Local realm id provider now reads cfg.Federation.LocalClusterId instead of hard-coded 0, so .realm list / cross-realm dispatch sees the right id. * Mangos.MySql gains two new commands (IGetFederationPeersQuery and IUpdateFederationMarkerCommand) plus their .sql resources. Queries follow the existing Dapper pattern (UpdateAccountCommand etc.). * FederationRouter now optionally takes IGetFederationPeersQuery, starts a 60-second refresh loop in StartAsync(), and consults its PeerInfo cache before falling back to the caller-supplied lambda. GetOrOpenAsync resolves clusterId -> endpoint from realmlist. WorldCluster/Program.cs awaits federationRouter.StartAsync(). * WS_Commands.Admin.cs's `.realm show / .realm hide` now flips account.federation_show_markers via the new MySql command, resolving the calling player's account through character.client .Account. Persists across logins. ClusterAdminCommandHandler's RealmMarker verb retains the cross-realm flow for operators flipping a remote account via --account <name>. * GetServerInfo now reports BATTLEGROUNDs.Count instead of zero so the supervisor's load-aware placement sees real BG load. * Federated group invites: FederatedGroupInviter binds onto FederationRouter.OnGroupInvite, finds the local recipient, and sends SMSG_GROUP_INVITE so the standard popup appears with the leader's name prefixed by their realm tag. Replies to the leader's cluster with a GroupInviteResponseEnvelope. Outbound side: the existing CMSG_GROUP_INVITE handler now detects "Name-RealmTag" patterns (numeric clusterId or textual displayTag) and forwards via the router. Same Name-RealmTag tag-name resolution applied to the whisper path. * Phase B shard scaffold: ShardClaim / ShardRelease envelopes (AdminMethodId 0x0360/0x0361), FederationLink dispatch, and an in-memory ShardRegistry that tracks (mapId, shardKey) -> owner cluster + relay endpoint. Bound at cluster startup. The world's enter-zone path will consult ShardRegistry.GetShard in a follow-up PR; this commit lays the wire format and registry so that follow-up is a one-file add. Cycle audit: Mangos.Cluster -> Mangos.Cluster.Admin and Mangos.MySql; both already referenced. Mangos.World -> Mangos.MySql already referenced. No new cycles. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
… invite reply, peer maintenance Closes the last loose ends from the world-cluster proxy series. Build verified locally: dotnet build of Mangos.sln passes with 0 errors and all 15 existing tests still green; the 106 remaining warnings are all pre-existing legacy code (deprecated SQL.Query and unused fields). What changed: * ClusterAdminCommandHandler: implements every previously-stubbed AdminVerb. ServerClaimMaps asks a world to host a comma-separated --maps list. InstanceSpawn picks the least-loaded eligible world via the supervisor's load heuristic. InstanceShutdown / InstanceRestart fan out to every world that claims the map. InstanceInfo reports hosting world plus any active shard claim. InstanceKick is a hint alias for shutdown. RealmList now lists peers (clusterId, tag, endpoint) when federation is on. RealmPeers shows live authenticated links from the router's Peers map. RealmMarker accepts --account for cross-realm operator overrides and points self-toggle users at the in-game .realm show / .realm hide. * FederatedChatDeliverer wires onto FederationRouter.OnChat and delivers every inbound envelope as a standard SMSG_MESSAGECHAT to the local recipient(s). Whisper -> single recipient by name; Party / Raid -> all local members of the federated group; System -> broadcast. Marker rendering goes through RealmMarkers.Decorate so whispers always carry the [tag] regardless of preference. * Federated invite accept/decline now closes the loop. Inviter remembers each pending invite by recipient guid; the cluster's CMSG_GROUP_ACCEPT and CMSG_GROUP_DECLINE handlers consult TryHandleAccept/TryHandleDecline first and short-circuit to a GroupInviteResponseEnvelope back to the leader's cluster, instead of crashing on the missing local Group object. * FederationRouter.StartAsync now spins both a peer-table refresh loop (60s realmlist DB poll) and a maintenance loop (15s) that proactively dials every known peer and heartbeats live links via the new FederationLink.HeartbeatAsync. Failed heartbeats drop the link so the next iteration redials. Routes around peer outages without operator intervention. * WorldCluster/Program.cs binds the new chat deliverer alongside the existing inviter and shard registry hookups. No new project references; layering unchanged. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
Closes the last open item from the world-cluster proxy series. The shard registry that arrived in PR MangosServer#6 is now actually consulted when a player logs in, and the leader's cluster actually emits the claims that populate peer registries. Build verified: 0 errors, 15/15 tests pass. End-to-end flow once a federated group exists: 1. Leader on Cluster A invites a player on Cluster B; B accepts. 2. Cluster B's FederatedGroupInviter sends a GroupInviteResponseEnvelope with TargetRealmId = B's clusterId (previously left at 0). 3. Cluster A's new FederatedShardClaimer receives the response, looks up the leader by groupId, and emits a ShardClaimEnvelope keyed by (leader.Map, groupId) back to Cluster B. 4. Cluster B's ShardRegistry records the claim. 5. The Cluster-B player tries to log in to a character on that map. World's ClientLogin now asks ICluster.QueryShard(mapId, guid). 6. Cluster B's QueryShard resolves guid -> group -> shardKey, finds the registry entry, sees OwnerClusterId != local, returns ShardLookupResult { Kind = Foreign, OwnerClusterId, OwnerEndpoint, OwnerDisplayTag }. 7. World logs the result, sends a SMSG_MESSAGECHAT system message to the client telling them which realm hosts their group's instance, and calls cluster.ClientDrop so they reconnect via the standard realmlist flow on the host realm. Wire format: new InteropMethodId.ControlQueryShard (0x0243) with a ShardLookupResult payload (Kind + OwnerClusterId + OwnerEndpoint + OwnerDisplayTag). ICluster gains QueryShard; dispatcher and proxy extended; serializer gains Read/WriteShardLookupResult. WorldServerClass on the cluster side now takes a ShardRegistry and a local-cluster-id provider via DI. QueryShard walks WorldCluster .CharacteRs to resolve guid -> group, uses Group.Id as the shard key, looks up ShardRegistry.GetShard(mapId, shardKey), and compares OwnerClusterId against the local id from FederationConfiguration. Layering note: no new project references. Mangos.Cluster.Interop -> Mangos.Common only; Mangos.Cluster -> Mangos.Cluster.Admin already. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
The CI pipeline runs `dotnet format --verify-no-changes` after build and test; it caught style drift my new files introduced (auto-property brace style, import ordering). This commit is the result of running `dotnet format` on the solution; only whitespace and import ordering change. No semantic changes, no behaviour change, build still clean (0 errors), tests still 15/15 pass, and `dotnet format --verify-no-changes` now exits 0. Two pre-existing files in Mangos.MySql/Connections/ and Mangos.Tests/Logging/ also had stale formatting; the formatter normalised those too. https://claude.ai/code/session_016a7UDPm1QxBbGTNt5weY4F
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change is