Official implementation of the paper The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models by Yan Wang, Yitao Xu, Nanhan Shen, Jinyan Su, Jimin Huang, and Zining Zhu (2026).
Mixture-of-Experts (MoE) models are widely assumed to achieve domain specialization through sparse routing. We challenge this assumption by introducing COMMITTEEAUDIT, a post-hoc framework that analyzes routing behavior at the level of expert groups rather than individual experts. Across three representative MoE models (OLMoE, Qwen3-30B-A3B, DeepSeek-V2-Lite) and the MMLU benchmark, we uncover a domain-invariant Standing Committee: a compact coalition of routed experts that consistently captures the majority of routing mass across domains, layers, and routing budgets, even when the architecture already includes shared experts.
This repository reproduces and extends the core finding of the paper: MoE routing does not primarily form domain-isolated specialists. Instead, models repeatedly rely on a small, stable, cross-domain expert coalition ("Standing Committee").
The paper addresses three questions:
- Existence - Do stable cross-domain expert groups emerge?
- Dynamics - How does this structure evolve with depth and model scale?
- Functionality - What roles do committee experts play compared with peripheral experts?
Key empirical takeaways from the paper:
- Compact committee size: across full-layer audits, committees are usually very small (
|C|typically 1-4; up to 5 in Qwen3-30B-A3B). - High routing concentration: committee coverage often exceeds
50%, and reaches around70%in deeper layers for DeepSeek-V2-Lite. - Strong influence density: committee members can contribute
13x-45xmore (on average) than non-members under matched settings. - Functional criticality: masking committee experts sharply degrades MMLU performance (correct rate from
0.39baseline to as low as0.03in deep layers). - Cross-dataset robustness: the pattern persists from MMLU to a C4-based free-form web-text setting, supporting a general routing tendency rather than benchmark-specific behavior.
- Existence. A small, stable set of 2–5 experts per layer absorbs up to 70% of the total routing mass across diverse domains. This committee persists even in architectures with explicit shared experts, suggesting centralization is an emergent property of sparse routing rather than an architectural artifact.
- Dynamics. The committee size stays compact as model capacity (
E) and routing budget (k) grow. Masking committee experts sharply degrades MMLU accuracy (e.g., 0.39 → 0.09 on DeepSeek-V2-Lite middle layers), confirming their functional importance. - Core–periphery organization. Qualitative token-level analysis shows committee members anchor reasoning and syntactic structure (e.g.,
What,Which,the,in), while peripheral experts are recruited on demand to handle domain-specific knowledge. - Implication for training. Standard load-balancing auxiliary losses that enforce uniform expert utilization may be working against this natural computational hierarchy, limiting training efficiency.
COMMITTEEAUDIT is a model-agnostic, three-stage auditing pipeline:
| Stage | Purpose | Key Quantity |
|---|---|---|
| I. Task-conditioned routing profiles | Extract per-domain routing signatures from a pre-trained MoE model. | Expert Contribution Index ECI = E_{x ∈ D_τ}[G(x)_i] |
| II. Task-specificity scoring | Filter out domains whose routing is not distinctive enough to support group-level analysis. | Silhouette score over cosine distances of routing vectors |
| III. Standing Committee exploration | For each layer, rank experts within every domain and select the Pareto-optimal set that is both high-ranking (low μ) and stable (low σ²). |
Pareto set over (μ_i, σ_i²) among experts with presence ratio ≥ γ |
CommitteeAudit/
├── router_lens.py # (1) Extract per-sample router outputs via forward hooks
├── MOE_adjustment_example.py # Template: how to modify a model's MoE gate for data collection
├── get_frequency_data.py # (2a) Aggregate expert activation frequency per (domain, layer)
├── get_frequency_and_weight_data.py # (2b) Aggregate frequency × normalized routing weight per (domain, layer)
├── Audit_Committee.py # (3) Run Standing Committee audit on the aggregated top-k table
└── README.md
The full reproduction pipeline has four steps:
- Modify the MoE gate of your target model so its
forwardexposestopk_idx/topk_weight— seeMOE_adjustment_example.py. - Collect router outputs per prompt with
router_lens.py→ savesRW/*.pth. - Aggregate routing info per
(domain, layer)withget_frequency_and_weight_data.py(orget_frequency_data.pyfor frequency only) → produces a top-k Excel table. - Run the Standing Committee audit on the aggregated table with
Audit_Committee.py→ produces the final CSV.
Note.
MOE_adjustment_example.pytargets DeepSeek-V2-Lite; porting to other models (e.g., OLMoE, Qwen3-MoE) requires adapting the modification to that model's gating implementation. The modified gate is intended for offline data collection only — do not use it for downstream inference or training.
Requires Python 3.9+.
git clone https://github.com/The-FinAI/CommitteeAudit.git
cd CommitteeAudit
pip install -r requirements.txtMinimum dependencies:
torch,transformers,tqdm(forrouter_lens.py)pandas,numpy,openpyxl(for aggregation andAudit_Committee.py)
The standalone audit script (Audit_Committee.py) accepts a top-k routing table and produces one Standing Committee per layer. You can skip stages (1)–(2) and try the audit directly on your own routing data, as long as it matches one of the supported schemas below.
- Adapt your MoE gate's
forwardto expose{topk_idx, topk_weight}per layer, followingMOE_adjustment_example.py. - Edit the model id, layer range, and dataset path in
router_lens.py:MODEL_ID = "deepseek-ai/DeepSeek-V2-Lite" STABLE_PROMPTS_DATASET_PATH = "MMLU.json" TARGET_LAYERS = range(1, 27) MOE_path = "model.layers.XXX.mlp.gate"
- Run the extractor:
Router weights are saved to
python router_lens.py --st 0 --ed 5000
./RW/router_weight_*.pth.
python get_frequency_and_weight_data.py # frequency × normalized weight
# or
python get_frequency_data.py # activation frequency onlyBoth scripts output an Excel file with columns idx, domain, topk_indices, topk_values, which is exactly the input format expected by Audit_Committee.py.
python Audit_Committee.py \
--input path/to/topk_table.xlsx \
--output path/to/standing_committee.csvFull reference for Audit_Committee.py is below.
python Audit_Committee.py --input INPUT_FILE --output OUTPUT_FILEExample with an Excel-like input:
python Audit_Committee.py \
--input example/olmoe_top16_mean_norm_weight_multi_freq.xlsx \
--output output/olmoe_top16_ratio.csvExample with a long-format CSV input:
python Audit_Committee.py \
--input example/c4_with_domain.csv \
--output standing_committee_audit.csv \
--presence-ratio 0.8 \
--agg-weight meanExample with an explicit k for long-format input:
python Audit_Committee.py \
--input example/c4_with_domain.csv \
--output standing_committee_audit.csv \
--k 16| Argument | Description | Default |
|---|---|---|
--input |
Path to the input file. Supported: .csv, .xlsx, .xls. |
required |
--output |
Path to the output CSV file. | required |
--presence-ratio |
Minimum fraction of domains (γ in the paper) in which an expert must appear to qualify for committee selection. Must lie in (0, 1]. |
0.8 |
--k |
Top-k value for long-format input. If omitted, inferred automatically. |
None |
--agg-weight |
Aggregation for repeated experts within the same (layer, domain) group in long-format input. One of mean, sum, median. |
mean |
Required columns:
idx— layer index.domain— domain name or identifier.topk_indices— ordered list of expert indices.topk_values— ordered list of expert weights aligned withtopk_indices.
Example:
idx,domain,topk_indices,topk_values
1,news,"[3, 18, 42, 7]","[0.31, 0.22, 0.19, 0.14]"
1,code,"[18, 3, 5, 42]","[0.28, 0.24, 0.21, 0.15]"
2,news,"[9, 12, 3, 40]","[0.35, 0.20, 0.17, 0.11]"Notes:
topk_indicesandtopk_valuesmust have the same length within each row.- The order defines expert rank and matters.
- List cells can be stored as Python-like strings such as
"[1, 2, 3]".
Required columns:
layer— layer index.domain— domain name or identifier.expert_idx— expert ID.expert_weight— router weight, contribution, or ECI for that expert.
Optional column:
topk_rank— rank used for tie-breaking and automatickinference.
Example:
layer,domain,expert_idx,expert_weight,topk_rank
1,news,3,0.31,0
1,news,18,0.22,1
1,news,42,0.19,2
1,code,18,0.28,0
1,code,3,0.24,1
1,code,5,0.21,2Notes:
- When
topk_rankis present, it is used to inferkand break ties between experts with equal aggregated weight. - When
topk_rankis absent,kis inferred from the largest number of unique experts in any(layer, domain)group unless--kis passed. - Repeated
(layer, domain, expert_idx)rows are aggregated using--agg-weight.
- Reads the input file and auto-detects the schema.
- If long-format, reconstructs one ordered top-
klist per(layer, domain). - For each layer, computes per-expert statistics:
- presence count across domains,
- mean rank
μ(missing domains penalized ask + 1), - rank variance
σ², - average contribution across all domains.
- Filters experts whose presence fraction is below
--presence-ratio. - Selects Pareto-optimal experts (lower
μis better; lowerσ²is better). - Exports one summary row per layer.
A CSV file with the following columns:
| Column | Meaning |
|---|---|
Layer |
Layer ID reported by the audit. |
Committee |
Comma-separated list of selected expert IDs. |
Size |
Number of experts in the committee (` |
Avg_mu |
Average committee mean rank (μ, lower is better). |
Avg_sigma_sq |
Average committee rank variance (σ², lower is better). |
Coverage |
Fraction of total layer contribution captured by the committee (ECI coverage). |
Ratio |
Average member contribution divided by average non-member contribution (Influence Density Ratio). |
Example:
Layer,Committee,Size,Avg_mu,Avg_sigma_sq,Coverage,Ratio
0,"3, 18, 42",3,1.67,0.89,0.7421,58.31
1,"9, 12",2,1.50,0.25,0.6814,67.08| Symptom | Likely cause |
|---|---|
Input columns do not match a supported schema. |
Missing required columns — check the schema sections above. |
Unsupported file format: <ext> |
Extension is not .csv, .xlsx, or .xls. |
| Parse error on list cells | Invalid list syntax in topk_indices or topk_values. |
presence_ratio must be in the range (0, 1]. |
--presence-ratio outside the valid range. |
| Empty output | Input became empty after preprocessing (e.g., all rows filtered out). |
To keep names consistent and typo-free, this repo uses:
get_frequency_data.py(corrected from historical typoget_frequnecy_data.py)get_frequency_and_weight_data.pyAudit_Committee.pyrouter_lens.pyMOE_adjustment_example.py
If you find this work useful, please cite:
@misc{wang2026standingcommittee,
title = {The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models},
author = {Yan Wang and Yitao Xu and Nanhan Shen and Jinyan Su and Jimin Huang and Zining Zhu},
year = {2026},
eprint = {2601.03425},
archivePrefix= {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2601.03425}
}This project is released under the MIT License — see LICENSE for details.
For questions or issues, please open a GitHub issue or contact the authors:
- Yan Wang —
wy2266336@gmail.com - Zining Zhu —
zzhu41@stevens.edu