Skip to content

partcleda/macro-place-challenge-2026

Repository files navigation

Partcl/HRT Macro Placement Challenge

Hudson River Trading        Partcl

Win $20,000 by developing better macro placement algorithms!

Partcl and Hudson River Trading are excited to co-host a competition to solve the macro placement problem.

About Macro Placement

Macro placement is the problem of positioning large fixed-size blocks (SRAMs, IPs, analog macros, etc.) on a chip floorplan so that routing congestion, timing, power delivery, and area constraints are balanced. Unlike standard-cell placement, macros have strong geometric and connectivity constraints, so the challenge is to explore a highly discrete design space while minimizing wirelength, avoiding blockages, and preserving downstream routability and timing quality.

For example, the ibm01 benchmark has:

  • 246 hard macros of varying sizes (ranging from 0.8 to 27 μm², with 33× size variation)
  • 7,269 nets connecting macros to each other and to 894 pre-placed standard cell clusters
  • A 22.9 × 23.0 μm canvas with 42.8% area utilization

Simulated annealing on ibm01
Force-directed placement on ibm01

About HRT Hardware

Hudson River Trading (HRT) is a leading quantitative trading firm at the forefront of technical innovation in global financial markets.

HRT’s Hardware team builds the high-performance compute systems at the core of our trading infrastructure. We use FPGAs and ASICs to drive low-latency decision-making and power custom solutions across the trading stack, from bespoke circuits to machine learning accelerators.

We’re proud to sponsor this competition because advancing macro placement and low-level hardware optimization directly aligns with the kinds of hard, performance-critical engineering challenges our teams tackle every day.

Joining Hudson River Trading’s hardware team means working alongside leading engineers in one of the most advanced computing environments in global finance. Learn more about open roles at hudsonrivertrading.com.

About Partcl

Partcl is rebuilding chip design infrastructure from the ground up for the GPU era.

Modern chip design is slow, fragmented, and fundamentally constrained by tools built decades ago. Critical workflows like timing analysis and placement still take hours to days - limiting how much engineers can explore and optimize.

We’re changing that.

Partcl develops GPU-accelerated systems for physical design that run orders of magnitude faster than legacy tools. Our goal is simple: make iteration cheap enough that design space exploration becomes the default, not the exception.

Background Papers

[1] An Updated Assessment of Reinforcement Learning for Macro Placement

[2] Assessment of Reinforcement Learning for Macro Placement

[3] Reevaluating Google's Reinforcement Learning for IC Macro Placement

[4] A graph placement methodology for fast chip design

🏆 Prizes

  • $20,000 — Grand Prize: The top 7 submissions by proxy score are evaluated through the OpenROAD flow on NG45 designs (including hidden designs). Among those 7, the submission that beats the SA and RePlAce baselines (reported in An Updated Assessment of Reinforcement Learning for Macro Placement) by the largest margin on WNS, TNS, and Area wins the Grand Prize.
  • $20,000 — First Place (Proxy): Awarded to the #1 submission by proxy score. Only awarded if no submission qualifies for the Grand Prize.
  • $5,000 — Second Place: Awarded to the runner-up of the Grand Prize. If no submission qualifies for the Grand Prize, awarded to the #2 submission by proxy score.
  • $4,000 — Innovation Award: Granted to the most creative or technically innovative approach among the top entries, as determined by the judging panel.
  • Swag: Every valid submission gets HRT swag!
  • Note: An additional score adjustment will be applied based on human-expert analysis of the resulting placement.

For full Grand Prize scoring rules, feasibility gate, tie-breaking, and ORFS-failure handling, see SCORING.md.

Submission Format

  • All submissions will be via google form. Submissions may be made public or private before the end of judging.
  • Private submissions will be required to share repository with judges so they may clone/evaluate the method.
  • Teams may be up to 5 individuals.
  • The deadline for submissions is May 21, 2026, 11:59 pacific.
  • All teams may only submit one algorithm.
  • All winning implementations must be made open-source under Apache 2.0 or GPL
  • All submissions must be registered via this Submission Link.
  • All submissions must be under 1 hour end-to-end runtime (per benchmark) for the macro placement algorithm.
  • All submissions will be evaluated on a AMD EPYC 9655P with 16 cores + 100GB of memory and an NVIDIA RTX 6000 Ada 48GB.

Additional Rules

Allowed

  • Any algorithmic approach: SA, RL, GNN, analytical methods, hybrid approaches, learning-based, etc.
  • Any framework: PyTorch, TensorFlow, JAX, or pure Python/C++
  • Any optimization technique: Gradient descent, evolutionary algorithms, local search, etc.
  • Training on public benchmarks: You can learn from the IBM benchmark data
  • Hard-macro orientation flips (Klein-4 only: N, FN, FS, S) — carried to Tier 2 via an optional orientations.pt sidecar

Not Allowed

  • Modifying the evaluation functions (must use TILOS MacroPlacement evaluator as-is)
  • Hardcoding solutions for specific benchmarks (must be general algorithm)
  • Using external/proprietary placement tools (must be open-source submission)
  • Exceeding runtime limits (1 hour per benchmark hard timeout)
  • Overlaps in resulting placement (strictly zero overlap between hard macros — no tolerance. Participants should add small gaps in their legalization to avoid float-precision edge cases.)
  • 90° rotations of hard macros (R90, R270, FE, FW) — the fakeram45 SRAMs in our benchmarks aren't designed for rotation (pin access and internal metal direction assume a fixed orientation)
  • Resizing soft macros — soft-macro size is a proxy-only concept for density/congestion that doesn't translate to Tier 2; sizes are locked to the initial .plc values on every compute_proxy_cost call

Evaluation Details

Evaluation is two-tiered:

Tier 1: Proxy Cost Ranking (All Submissions)

All submissions are ranked by proxy cost across the 18 IBM benchmarks. This is the primary qualifying metric. Proxy cost is computed using the TILOS MacroPlacement evaluator:

Proxy Cost = 1.0 × Wirelength + 0.5 × Density + 0.5 × Congestion

Baseline numbers are from: An Updated Assessment of Reinforcement Learning for Macro Placement

Tier 2: OpenROAD Flow Validation (Top Submissions)

The top 7 submissions by proxy score will be evaluated through the full OpenROAD flow on NG45 designs to measure real PnR outcomes: WNS, TNS, and Area.

  • The Grand Prize ($20K) is awarded to the highest-scoring submission using a geometric mean of improvement ratios across WNS, TNS, and Area vs. the average SA/RePlAce baseline.
  • To qualify, submissions must pass a feasibility gate — timing (WNS, TNS) cannot regress below both baselines on any design.
  • To avoid overfitting, we will also evaluate on 1-2 hidden NG45 designs.
  • Full scoring rules: SCORING.md

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/partcleda/partcl-macro-place-challenge.git
cd partcl-macro-place-challenge

# Initialize TILOS MacroPlacement submodule (required for evaluation)
git submodule update --init external/MacroPlacement

# Install the package and all dependencies
uv sync

# Verify the setup
uv run evaluate submissions/examples/greedy_row_placer.py -b ibm01

Run Your First Example

# Run the greedy row placer on ibm01
uv run evaluate submissions/examples/greedy_row_placer.py

# Run on all 17 IBM benchmarks
uv run evaluate submissions/examples/greedy_row_placer.py --all

# Run on NG45 commercial designs (ariane133, ariane136, mempool_tile, nvdla)
uv run evaluate submissions/examples/greedy_row_placer.py --ng45

# Visualize the result
uv run evaluate submissions/examples/greedy_row_placer.py --vis
uv run evaluate submissions/examples/greedy_row_placer.py --all --vis

Running on all benchmarks produces a summary like:

Benchmark     Proxy        SA   RePlAce     vs SA  vs RePlAce  Overlaps
   ibm01    2.0463    1.3166    0.9976    -55.4%     -105.1%         0
   ibm02    2.0431    1.9072    1.8370     -7.1%      -11.2%         0
   ...
     AVG    2.2109    2.1251    1.4578     -4.0%      -51.7%         0

The greedy placer achieves zero overlaps but makes no attempt to optimize wirelength or connectivity — your job is to do better! See SETUP.md for the full API reference and submissions/examples/ for working examples.

🎯 IBM Benchmark Suite (ICCAD04)

We evaluate on the complete ICCAD04 IBM benchmark suite:

Benchmark Macros Nets Canvas (μm) Area Util. SA Baseline RePlAce Baseline
ibm01 246 7,269 22.9×23.0 42.8% 1.3166 0.9976
ibm02 254 7,538 23.2×23.5 43.1% 1.9072 1.8370
ibm03 269 8,045 24.1×24.3 44.2% 1.7401 1.3222
ibm04 285 8,654 24.8×25.1 44.8% 1.5037 1.3024
ibm06 318 9,745 26.1×26.5 46.1% 2.5057 1.6187
ibm07 335 10,328 26.8×27.2 46.8% 2.0229 1.4633
ibm08 352 10,901 27.5×27.9 47.4% 1.9239 1.4285
ibm09 369 11,463 28.1×28.5 48.0% 1.3875 1.1194
ibm10 387 12,018 28.8×29.2 48.6% 2.1108 1.5009
ibm11 405 12,568 29.4×29.8 49.2% 1.7111 1.1774
ibm12 423 13,111 30.1×30.5 49.8% 2.8261 1.7261
ibm13 441 13,647 30.7×31.1 50.4% 1.9141 1.3355
ibm14 460 14,178 31.4×31.8 51.0% 2.2750 1.5436
ibm15 479 14,704 32.0×32.4 51.6% 2.3000 1.5159
ibm16 498 15,225 32.7×33.1 52.2% 2.2337 1.4780
ibm17 517 15,741 33.3×33.7 52.8% 3.6726 1.6446
ibm18 537 16,253 34.0×34.4 53.4% 2.7755 1.7722

Each benchmark includes:

  • Hard macros (you place these)
  • Soft macros (you can also place these)
  • Nets connecting all components
  • Initial placement (hand-crafted, serves as reference)

Baseline Analysis:

  • RePlAce (⭐) consistently outperforms SA across all benchmarks
  • RePlAce achieves 15-55% lower proxy cost than SA
  • To qualify for the Grand Prize, your placement must also produce better WNS, TNS, and Area than both baselines when evaluated through the OpenROAD flow on NG45 designs
  • Both baselines achieve zero overlaps (enforced as hard constraint)

💡 Why This Is Hard

Despite "only" 246-537 macros, this problem is extremely challenging:

  1. Massive search space: ~10^800 possible placements (even with constraints)
  2. Conflicting objectives: Wirelength wants clustering, density wants spreading, congestion wants routing space
  3. Non-convex landscape: Millions of local minima, discontinuities, plateaus
  4. Long-range dependencies: Moving one macro affects costs globally through thousands of nets
  5. Hard constraints: No overlaps between heterogeneous sizes (33× size variation)
  6. Tight packing: 43-53% area utilization leaves little slack
  7. Runtime matters: Must be fast enough to be practical (< 5 minutes ideal)

Classical methods (SA, RePlAce) have been refined for decades but still have room for improvement!

📖 Documentation

  • Setup & API Reference: SETUP.md - Infrastructure details, benchmark format, cost computation, testing
  • Example Submissions: submissions/examples/ - Working placer examples

📚 References

  • TILOS MacroPlacement: GitHub Repository

    • Source of evaluation infrastructure
    • ICCAD04 benchmarks
    • SA and RePlAce baseline implementations
  • ICCAD04 Benchmarks: Classic macro placement benchmark suite used in academic research

🏅 Leaderboard

Submissions are ranked by average proxy cost across all 17 IBM benchmarks (lower is better). Zero overlaps required on all benchmarks. Scores are unverified until confirmed by judges.

Rank Team Avg Proxy Cost Best Worst Overlaps Runtime Verified Notes
1 "vmallela" 1.0109 0.7644 1.2921 0 15.5h total Verified 1.0109 (self-reported 1.1)
2 "Cezar" 1.037 0 55min/bench Resubmitted 5/3. Verification blocked on missing steps/analytical.py import. Previous variant verified 1.2224.
3 "KLA MACH" 1.2121 0.8527 1.6532 0 2h15min total Verified 1.2121 (self-reported 1.2355). Consolidates UTDA / Chuanqi Chen / KLA MACH submissions (one algorithm per team).
4 "Hoop Dreams" 1.2206 0 20min/bench New 5/1. Verification blocked on Python 3.12 / 3.11 ABI mismatch.
5 "Shoom" 1.2353 0 42min/bench Resubmitted 5/1 (was 1.3381).
6 "ArzunPD" 1.2478 0 55min/bench Resubmitted 5/1 (was HyperPlace, verified 1.4421).
7 "RoRa" 1.2788 0.9577 1.6222 0 2.6h total Verified 1.2788 (self-reported 1.2723). Resubmitted 5/1.
8 "William Zhang" 1.2767 0 259s/bench Resubmitted 5/2 (was "Convex Optimization", verified 1.4556). Blocked on missing casadi module.
9 "MTK" 1.2818 0.9073 1.6529 0 37s/bench (GPU) Verified 1.2818 (self-reported 1.317).
10 "Electric Beatle" 1.3253 0 2000s/bench (GPU) Resubmitted 4/30 (was verified 1.3913).
11 "UToronto Analytical" 1.3323 0.9371 1.6545 0 24min total Verified 1.3323 (self-reported 1.3325).
12 "V5" 1.3382 0 850s/bench New 4/23.
13 "Archgen" 1.3479 0 2404s total New 4/24.
14 "Varun's Parallel Worlds" 1.4017 1.0362 1.7298 0 27s/bench
15 "UT Austin - AS" 1.4076 0 17s/bench
16 "ByteDancer" 1.4151 1.0236 1.7792 0 38min/bench
17 "TAISPlAce" 1.4321 0 28min/bench
18 "Two-IIITK-Kids" 1.436 0 38min/bench New 5/2, resubmitted 5/4.
19 "Pragnay" 1.4427 0 632s/bench Blocked on compute_proxy_cost(..., plc=None) in fallback path.
20 "No Man's Sky" 1.4445 0 8.8min/bench New 5/4. Repo not accessible.
21 "another Waterloo kid" 1.4568 0 118s/bench Blocked on Modal cloud dispatch — can't run air-gapped.
RePlAce (baseline) 1.4578 0.9976 1.8370 0
22 "W3 Solutions" 1.4824 0 90s/bench Runtime exceeds 1h/bench cap.
23 "Jiangban Ya" 1.4943 1.0891 1.8099 0 49s/bench
24 "UTAUSTIN-CT" 1.5062 1.1363 1.7941 0 6s/bench
25 "oracleX" 1.5130 1.1340 1.7937 0 11s/bench
26 "SEVmakers" 1.5200 0 200s/bench Private repo — pending judge access.
27 "CA" 1.5247 1.2226 1.7945 0 2s/bench Verified 1.5247 (self-reported 1.5238).
28 "#5 ubc cpen student" 1.5337 1.1411 1.8084 0 13s/bench
29 Will Seed (Partcl) 1.5338 1.1625 1.7965 0 35s total
30 "RUDY Can't Fail" 1.5397 1.1927 1.8881 0 6min total Verified 1.5397 (self-reported 1.3605).
31 "UT Austin - RH" 1.6037 0 4.5s/bench
32 "UT Austin - CT" 1.8706 0 187s/bench
33 "AS" 1.9121 1.4614 2.3508 0 0.16s total
34 "Adi's Team" 2.0025 0 3726s/bench Blocked on compute_proxy_cost(skip_congestion=True) kwarg.
35 "Sharc #1" 2.0433 1.5143 2.4336 0 96s/bench
SA (baseline) 2.1251 1.3166 3.6726 0
Greedy Row (demo) 2.2109 1.6728 2.7696 0 0.3s total
"Binghamton" pending
"MacroBio" pending
DQ "Mike Gao" self-reported 1.3255 1939 16min/bench 1939 overlaps across 17 benchmarks.
DQ "BakaBobo" self-reported 1.4044 282s/bench Missing import — code won't run.

Submit your results via the Submission Link!

🤔 FAQ

Q: What benchmarks are used? A: Tier 1 (proxy cost) uses 17 IBM ICCAD04 benchmarks — the standard academic suite with well-established baselines. Tier 2 (OpenROAD flow) uses NG45 commercial designs (ariane133, ariane136, mempool_tile, nvdla) plus 1-2 hidden designs. You can evaluate on both with --all (IBM) and --ng45 (NG45).

Q: What if I beat one baseline but not the other? A: You must beat BOTH SA and RePlAce baselines on WNS, TNS, and Area to qualify for the Grand Prize. You can still win the Proxy or Innovation prizes regardless.

Q: Are there hidden test cases? A: All 17 IBM benchmarks for proxy cost ranking are public. The 4 NG45 designs are also public. For the OpenROAD flow evaluation (Tier 2), we will additionally test on 1-2 hidden NG45 designs to ensure generalization.

Q: What counts as "beating" the baseline? A: For proxy cost (Tier 1), your aggregate score across all IBM benchmarks must be lower than the baselines. For the Grand Prize (Tier 2), your OpenROAD results for WNS, TNS, and Area must surpass both SA and RePlAce baselines on NG45 designs.

📧 Contact

📄 License

This project is licensed under the Apache License 2.0 - see LICENSE.md for details.

Competition Updates

The organizers may update or clarify rules, evaluation details, timelines, prizes, or infrastructure as needed to ensure fairness, technical accuracy, and smooth operation of the competition. Any updates will be communicated through official channels and will apply going forward.

Participation in the competition constitutes acceptance of the current rules and any subsequent updates. The organizers’ decisions regarding scoring, eligibility, and interpretation of these rules are final.

Submissions & contact information may be shared with sponsors.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors