Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
8142f03
Add project setup: PLAN.md, uv config, and ashvin/ working directory
ashvin-verma Mar 21, 2026
62ab734
Implement placement solver: 0.0000 overlap on all tests 1-10
ashvin-verma Mar 22, 2026
2da2e57
Achieve 0.0000 overlap on all 12 tests including 100K cells
ashvin-verma Mar 22, 2026
67c4890
Add WL optimization: gradient polish + re-legalize cycles
ashvin-verma Mar 22, 2026
3402250
Add cell swap WL optimization + gradient polish pipeline
ashvin-verma Mar 22, 2026
5387f73
Add optuna tuning + cell swap optimization
ashvin-verma Mar 22, 2026
09194ca
Add min-disturbance legalization (kept as option), edge visualization
ashvin-verma Mar 22, 2026
872ae60
Add multi-start solver, spectral init, fast cell swaps
ashvin-verma Mar 22, 2026
e87c03a
Update best config from optuna v3: WL 0.4091 on all tests
ashvin-verma Mar 22, 2026
a1298a5
Add barycentric refinement, scatter solver, streamline pipeline
ashvin-verma Mar 22, 2026
f181770
Targeted scatter: 12% WL improvement (0.45 → 0.40)
ashvin-verma Mar 22, 2026
af9718c
Multi-scatter WL optimization: 0.3842 avg WL, nuclear loss experiments
ashvin-verma Mar 22, 2026
1dc0c1b
Region-aware pre-positioning tested (reverted), 5-scatter iterations
ashvin-verma Mar 22, 2026
0d08356
Multi-pass compiler-style pipeline: legalize→scatter→GD→re-legalize
ashvin-verma Mar 22, 2026
d196dc2
Momentum barycentric, WL-aware legalization, spatial hash overlap check
ashvin-verma Mar 22, 2026
e1e4289
WIP: Net-aware legalizer (candidate-slot based)
ashvin-verma Mar 22, 2026
80fff43
Hybrid legalization: row-pack then net-aware refinement. WL 0.3613
ashvin-verma Mar 22, 2026
1b4240c
Detailed placement engine: pair swaps + cell reinsertion
ashvin-verma Mar 22, 2026
e3de003
New best: 0.3540 avg WL, rank 9. Detailed placement working.
ashvin-verma Mar 22, 2026
7246044
Add cell inflation, anchor loss, topology-preserving legalization
ashvin-verma Mar 23, 2026
40b0d24
Add iterative swap engine with cross-row reinsertion
ashvin-verma Mar 23, 2026
da96f37
Save architecture overhaul plan + Abacus legalizer + visual analysis
ashvin-verma Mar 23, 2026
757c809
Revert interleaved GD-legalize (regressed), save constructive plan
ashvin-verma Mar 23, 2026
41f1000
Test & revert row-aware GD (Step 2) — same ceiling as GD approach
ashvin-verma Mar 23, 2026
f7e10f9
Island-clustered init as multistart strategy
ashvin-verma Mar 23, 2026
be9b2ff
Test 4 init strategies: random wins 5/7 tests (avg 0.371)
ashvin-verma Mar 23, 2026
33ea8e6
WL-aware Abacus: wins 7/9 in isolation but pipeline co-adaptation blo…
ashvin-verma Mar 24, 2026
0775355
Constructive v2: legal-from-the-start, no GD needed
ashvin-verma Mar 24, 2026
97f196e
Macro avoidance via boundary projection in constructive v2
ashvin-verma Mar 24, 2026
7917d75
Zero overlap constructive placement via precomputed blocked intervals
ashvin-verma Mar 24, 2026
43cad7b
Zero overlap on all tests via precomputed blocked intervals
ashvin-verma Mar 24, 2026
ba81450
Reduce macro gaps + stronger swap engine with oscillation prevention
ashvin-verma Mar 24, 2026
aad3315
Bidirectional compaction + right-sized macro gaps = zero overlap all …
ashvin-verma Mar 24, 2026
3156149
Cluster-then-spread constructive: avg WL 0.406 (was 0.417)
ashvin-verma Mar 24, 2026
dd8b62f
BFS from macros tested, iterative averaging wins (0.406 vs 0.410)
ashvin-verma Mar 24, 2026
50e7cf7
BFS+avg converges to same local min as averaging alone — revert to si…
ashvin-verma Mar 24, 2026
64caa6c
WL-aware overlap resolution: avg 0.404 (was 0.406)
ashvin-verma Mar 24, 2026
75890a6
Hybrid GD+constructive spreading: avg 0.402, T2 best ever 0.315
ashvin-verma Mar 25, 2026
63bb79d
Within-row-only compact: same result as WL-aware spreading
ashvin-verma Mar 26, 2026
229dac9
Row redistribution: net zero (helps T2/T5/T8, hurts T6/T7)
ashvin-verma Mar 26, 2026
61e3e58
Load-balanced row assignment: avg 0.405 (was 0.406)
ashvin-verma Mar 27, 2026
c8dcb2f
GPU-friendly pair generation: torch ops instead of Python loops
ashvin-verma Mar 27, 2026
fad7151
Log GPU overlap speedup: 1.3-2.5x across all tests
ashvin-verma Mar 27, 2026
2f074d3
Fully vectorized pair generation for N<=2000
ashvin-verma Mar 27, 2026
25bc343
Final leaderboard submission: projected GD shelf legalizer
ashvin-verma Apr 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
41 changes: 41 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# CLAUDE.md

## How to Run

This project runs under WSL. From a Windows terminal:

wsl -d Ubuntu-24.04

Then from the repo root (`/mnt/c/Users/ashvi/Documents/intern_challenge`):

uv run python test.py # upstream test suite (all 12 tests)
uv run python ashvin/run_tests.py # instrumented runner (timing + CSV)
uv run python ashvin/run_tests.py --tests 1,2,3 # run specific tests
uv run python ashvin/run_tests.py --tag experiment1 # tag for CSV filename

## Environment

- Python 3.12 (managed by uv)
- PyTorch with CUDA 12.8 (RTX 3080 Ti)
- Package manager: uv (see pyproject.toml)
- OS: WSL Ubuntu 24.04 on Windows 11

## Project Structure

- `placement.py` — challenge code (we implement `overlap_repulsion_loss()` here)
- `test.py` — upstream test harness (DO NOT MODIFY)
- `PLAN.md` — strategic roadmap
- `HISTORY.md` — raw experiment results log
- `PROGRESS.md` — analysis of each run: what worked, why, what to try next
- `ashvin/` — all custom code
- `ashvin/run_tests.py` — instrumented test runner with CSV output
- `ashvin/instrumented_train.py` — training wrapper with per-phase timing
- `ashvin/results/` — CSV output from experiments

## Conventions

1. `placement.py` is modified only for `overlap_repulsion_loss()` (the challenge). `test.py` is read-only.
2. All custom code goes in `ashvin/`.
3. Log every experiment: raw data in `HISTORY.md`, analysis in `PROGRESS.md`.
4. Primary metric: overlap_ratio (lower = better, 0.0 = perfect).
5. Secondary metric: normalized_wl (lower = better).
45 changes: 45 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Experiment History

## Baseline — Placeholder Overlap Loss (2026-03-21)

**Config:** Default `train_placement()` params (1000 epochs, Adam lr=0.01, lambda_wl=1.0, lambda_overlap=10.0). `overlap_repulsion_loss()` is a placeholder returning constant 1.0.

| Test | Cells | Overlap | Norm WL | Time (s) |
|------|-------|---------|---------|----------|
| 1 | 22 | 0.9091 | 0.3435 | 16.16 |
| 2 | 28 | 0.8929 | 0.3450 | 0.62 |
| 3 | 32 | 0.9375 | 0.3492 | 0.59 |
| 4 | 53 | 0.8302 | 0.3866 | 0.93 |
| 5 | 79 | 0.9367 | 0.4173 | 0.82 |
| 6 | 105 | 0.7429 | 0.3443 | 0.83 |
| 7 | 155 | 0.7548 | 0.3403 | 0.90 |
| 8 | 157 | 0.8662 | 0.3784 | 0.89 |
| 9 | 208 | 0.6394 | 0.3787 | 0.87 |
| 10 | 2010 | 0.7846 | 0.3441 | 1.82 |

**Averages (tests 1-10):** overlap=0.8294, wl=0.3627, total_time=24.43s

**Notes:** Tests 11 (10K cells) and 12 (100K cells) not run — `calculate_cells_with_overlaps()` uses O(N^2) Python loops, too slow for large designs.

## Naive Overlap Loss — N×N Broadcasting (2026-03-21)

**Config:** Same default params. Implemented `overlap_repulsion_loss()` using pairwise broadcasting: `relu((w1+w2)/2 - |x1-x2|) * relu((h1+h2)/2 - |y1-y2|)`, upper triangle mask, normalized by pair count.

| Test | Cells | Overlap | Norm WL | Time (s) | Overlap Loss (s) |
|------|-------|---------|---------|----------|-------------------|
| 1 | 22 | 0.4091 | 0.5036 | 13.60 | 0.17 |
| 2 | 28 | 0.6429 | 0.4124 | 0.94 | 0.15 |
| 3 | 32 | 0.5000 | 0.6023 | 0.91 | 0.14 |
| 4 | 53 | 0.6038 | 0.4607 | 1.14 | 0.18 |
| 5 | 79 | 0.6076 | 0.5398 | 1.09 | 0.17 |
| 6 | 105 | 0.6476 | 0.4323 | 1.18 | 0.21 |
| 7 | 155 | 0.7097 | 0.3982 | 1.48 | 0.29 |
| 8 | 157 | 0.6815 | 0.4341 | 1.55 | 0.32 |
| 9 | 208 | 0.6202 | 0.4094 | 1.80 | 0.41 |
| 10 | 2010 | 0.8164 | 0.3486 | 67.84 | 30.79 |

**Averages (tests 1-10):** overlap=0.6239, wl=0.4541, total_time=91.53s

**vs Baseline:** overlap 0.83→0.62 (-25%), but wirelength worse 0.36→0.45 (tradeoff). Test 10 bottleneck: overlap loss 30.8s + backward 33.9s out of 66s training. O(N²) approach unusable for N>2000.

**Next:** Need hyperparameter tuning (more epochs, higher lambda_overlap, LR schedule) and scalable overlap engine for tests 11-12.
253 changes: 253 additions & 0 deletions PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# Goal
Build a strong solver for the partcl intern placement challenge.

Primary metric: overlap ratio = number of cells involved in overlaps / total cells.
Secondary metric: normalized wirelength.

Important constraint:
The test suite includes designs up to 10 macros + 100000 standard cells.
Do NOT use O(N^2) all-pairs overlap tensors except for tiny debugging cases.

# Problem framing
This is a scalable mixed-size overlap-removal problem, not full production PnR.
The best solution will likely be:
1. macro-aware
2. coarse-to-fine
3. spatially local
4. GPU-friendly
5. driven by search over solver schedules, not raw coordinate chromosomes

# Immediate tasks

## Task 1: inspect and instrument
- Read placement.py and test.py.
- Add timing breakdowns for:
- overlap loss
- wirelength loss
- optimizer step
- total runtime
- Add per-test logging:
- overlap_ratio
- num_cells_with_overlaps
- normalized_wl
- runtime
- Add seed control and CSV logging.

## Task 2: build a scalable overlap engine
Implement a spatial-hash or uniform-grid overlap candidate generator:
- bin cells by center
- only compare cells in same or neighboring bins
- support macros and std cells
- return candidate pairs
- compute overlap penalties only on candidate pairs

Need both:
- exact overlap metric for evaluation
- differentiable overlap loss for optimization

## Task 3: add a density term
Implement a bin overflow / density penalty:
- accumulate cell area into bins
- penalize overflow above target density
- make it differentiable if practical
- start with a simple smooth penalty

## Task 4: macro-first pipeline
Add a 2-stage solver:
- stage A: place / legalize macros first
- stage B: place std cells given macro anchors
- optional stage C: allow small macro nudges if hot bins remain

For macro placement, try:
- simulated annealing on macro coordinates
- or greedy local search with restarts

## Task 5: hot-bin repair
Implement a local repair pass:
- identify bins with highest overlap / overflow
- collect cells in those bins
- try batch local moves:
- small translations
- nearest-low-density-bin snap
- pair swaps
- short local reorder
- accept moves that reduce overlap first, wirelength second

## Task 6: outer-loop search
Do NOT use GA over all cell coordinates.
Use evolutionary search over solver parameters and schedules:
- overlap weight schedule
- density weight schedule
- learning rate / temperature schedule
- bin size
- number of repair passes
- macro move radius
- restart count
- stage transition criteria

Represent one candidate as a compact config dict.
Each candidate decodes into a deterministic or semi-deterministic run.

## Task 7: GPU acceleration
Port bottlenecks first:
- binning
- sorting / grouping
- candidate pair generation
- overlap scoring
- batch move scoring

Use PyTorch or Triton if convenient.
Do not port high-level orchestration until kernels matter.

# Experiments to run

## Baseline set
1. repo baseline
2. scalable overlap only
3. scalable overlap + density
4. macro-first + scalable overlap + density
5. macro-first + hot-bin repair
6. outer-loop EA over schedules
7. macro SA + deterministic cell spreading
8. parallel multi-start SA on macro-only state

## Ablations
- no macro-first
- no density term
- no hot-bin repair
- no outer-loop search
- different bin sizes
- different overlap penalties:
- area
- squared area
- softplus on overlap lengths before multiply
- different schedules:
- fixed
- ramped overlap weight
- overlap-first then WL polish

# Acceptance criteria
- Solver handles all benchmark sizes without OOM
- Overlap ratio driven to ~0 on most or all tests
- Runtime remains competitive
- Wirelength improves once overlap is solved

# Guardrails
- Never introduce full NxN tensors for large cases
- Do not use GA over raw coordinates
- Do not spend time on RL or learned policies yet
- Keep every change behind a config flag
- Always run ablations and save results to CSV

# Deliverables
- clean solver code
- config-driven experiment runner
- CSV results
- short notes on what helped, what failed, and why

# Post-algo: competitive analysis & tuning

## Task 8: compile & document what we did
- Write up each heuristic, what worked, what didn't, with numbers
- Clean up PROGRESS.md into a coherent narrative
- Ensure all code is well-organized in ashvin/

## Task 9: competitor analysis
- Download competitor solutions from the old leaderboard PRs (partcleda/intern_challenge)
- Run their solutions through our test suite
- Compare: overlap, wirelength, runtime
- Plot inspections: how do their placements look vs ours?
- Identify what they got right that we missed

## Task 10: new heuristics (informed by competitor analysis + literature)

### Key competitor insights (old leaderboard, all achieved 0.0000 overlap):
- **Annealed softplus** (not ReLU): beta ramps 0.1→4.0. Smooth early, sharp late. Used by top 3.
- **Lambda ramping**: overlap weight 20→200 linear (Shashank) or 4*(e/E)^10 exponential (Brayden)
- **Warmup + cosine LR**: LinearLR 5% warmup then CosineAnnealing. Pawan's 1.74s solution.
- **Deterministic legalization**: row-packing guarantees 0.0000. Marcos, 2.3s for 100K cells.
- **Soft-Coulomb repulsion**: 1/r² global field for spreading (manuhalapeth, WL=0.2630)
- **Cell swaps on high-WL edges**: Shashank's WL secret (0.1310)

### Strategies to implement:

**Strategy A: Annealed activation + lambda ramp + more epochs**
- Replace ReLU with annealed softplus/GELU/leaky-ReLU (try all three)
- Ramp lambda_overlap from 10% to 100% over training
- Warmup LR (5%) + cosine decay
- Double epochs (1000 per stage → 2000 total or more)
- This is the common thread across ALL zero-overlap competitors

**Strategy B: Simulated annealing for macro placement (Stage A replacement)**
- SA naturally maximizes entropy → spreads macros apart
- Perturbation: random macro translations, swaps
- Energy = overlap_area + alpha * wirelength
- Temperature schedule: high→low over iterations
- Accept worse moves probabilistically → escapes local minima
- Literature: TimberWolf (Sechen 1986), Dragon (Wang+ 2000) use SA for macro placement
- Our current gradient descent on 10 macros gets stuck; SA explores better

**Strategy C: Deterministic legalization (guarantees 0.0000)**
- After gradient descent + SA, run row-based greedy packing
- Sort cells by x-coordinate, assign to rows, resolve conflicts by shifting
- Handles macros as fixed obstacles, packs std cells around them
- Marcos achieves 100K cells in 2.3s with this approach
- Eliminates need for our current greedy repair (which doesn't guarantee 0)

**Strategy D: WL-aware post-optimization**
- After legalization (overlap = 0 guaranteed), optimize wirelength
- Cell swaps: for each high-WL edge, try swapping endpoints with neighbors
- Barycentric refinement: move each cell toward weighted center of its connected cells
- Accept moves only if overlap stays at 0
- This is where Shashank gets 0.1310 WL vs everyone else's 0.26+

### Implementation order:
1. Strategy A first (quick win, config changes only)
2. Strategy C next (guarantees 0.0000, enables WL optimization)
3. Strategy B if Stage A still fails on some seeds
4. Strategy D last (WL polish, competitive edge)

## Task 11: optuna hyperparameter tuning
- Define search space over: activation type, beta schedule, lambda ramp curve,
LR + warmup, epochs per stage, bin_size, repair params
- Objective: minimize overlap_ratio, tiebreak on normalized_wl
- Run on tests 1-10 with budget ~100-200 trials
- Also evaluate on alternate seeds (2001-2010) to prevent overfitting
- Apply best config, verify generalization
- Record best config and results

## Task 12: GPU acceleration (originally Task 7)
- Port pair generation to GPU (current bottleneck for 100K cells)
- Vectorize bin assignment + neighbor lookup
- Use torch sorting + searchsorted instead of Python defaultdict
- Target: test 12 under 60s (currently 392s)

# Longer-term plan (playing to win)

## Phase 1: Zero overlap (current priority)
- Strategy A + C should get us to 0.0000 on all tests
- Optuna tunes the exact schedule
- Validate on alternate seeds

## Phase 2: WL optimization (competitive edge)
- Strategy D (cell swaps + barycentric refinement)
- Multi-start: run solver 3-5 times with different seeds, pick best WL
- Edge sampling for large designs (Marcos: 50-80K edges/epoch)

## Phase 3: Scale + speed
- GPU acceleration for 100K cell tests
- Adaptive epoch count by problem size
- Target: all 12 tests under 60s total

## Phase 3.5: Benchmark competitors on tests 11-12
- Run top competitor solutions on the NEW test suite (tests 11-12: 10K and 100K cells)
- Most competitors used O(N²) approaches — they will OOM or timeout on 100K cells
- Document which competitors scale and which don't
- This is our competitive advantage: scalable spatial hash + legalization

## Phase 4: Outlandish ideas (if gap remains)
- Soft-Coulomb repulsion field (manuhalapeth)
- Graph neural network for initial placement prediction
- Force-directed placement with momentum
- Spectral placement (eigenvector-based initial positions)
- Reinforcement learning for schedule selection
Loading