Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 275 additions & 0 deletions ONBOARDING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# Onboarding: `placement.py` library

This document orients new contributors to the VLSI-style **cell placement** code in [`placement.py`](placement.py): what it does, how data is laid out, the public API (inputs, outputs, purpose), and where performance matters.

---

## Problem and geometry

The library optimizes **2D positions** of rectangular **cells** (macros and standard cells) so that:

1. **Overlap is minimized** (primary objective in the challenge).
2. **Wirelength** between connected **pins** is reduced (secondary).

**Convention:** Each cell is an axis-aligned rectangle **centered** at `(x, y)` with given `width` and `height`. Overlap between two cells is computed from center-to-center separation versus half-widths and half-heights (same criterion everywhere: strict separation when `|dx| == (w_i + w_j)/2` is treated as non-overlapping in the vectorized checks via `<`).

In **`generate_placement_input`**, pin `PIN_X` / `PIN_Y` are sampled in **cell-local** coordinates from the cell’s **lower-left** (between margins and the cell width/height). **`wirelength_attraction_loss`** forms world coordinates as **`cell_center + (PIN_X, PIN_Y)`** (same columns each step), so only `cell_features[:, 2:4]` is optimized; be aware this mixes “center + offset” with offsets that were generated from the lower-left frame.

---

## Quick mental model

```mermaid
flowchart LR
subgraph inputs [Inputs]
CF[cell_features N x 6]
PF[pin_features P x 7]
EL[edge_list E x 2]
end
subgraph train [train_placement]
WL[wirelength_attraction_loss]
OL[overlap_repulsion_loss]
ADAM[Adam on positions]
CF --> WL
CF --> OL
PF --> WL
EL --> WL
WL --> ADAM
OL --> ADAM
end
subgraph eval [Evaluation]
MET[calculate_normalized_metrics]
CF2[final cell_features]
CF2 --> MET
end
ADAM --> CF2
```

---

## Main modules of responsibility

| Section | Role |
|--------|------|
| **Setup** | Synthetic netlist generation (`generate_placement_input`). |
| **Optimization** | Differentiable losses and `train_placement` (the part you typically edit). |
| **Evaluation** | Non-differentiable metrics for reporting and tests (`calculate_*`). |
| **Visualization** | Optional Matplotlib plots (`plot_*`). |
| **Demo** | `main()` end-to-end script. |

The test harness [`test.py`](test.py) imports `generate_placement_input`, `train_placement`, and `calculate_normalized_metrics`.

---

## Data structures

### `CellFeatureIdx` / `PinFeatureIdx`

`IntEnum` types indexing columns of feature tensors. Prefer these over magic numbers.

### `cell_features` — shape `[N, 6]`

| Index | Name (enum) | Meaning |
|------|----------------|---------|
| 0 | `AREA` | Cell area (scalar used in normalization and generation). |
| 1 | `NUM_PINS` | Pin count for that cell (informational / generation). |
| 2 | `X` | Cell center **x** (optimized in training). |
| 3 | `Y` | Cell center **y** (optimized in training). |
| 4 | `WIDTH` | Full width of the rectangle. |
| 5 | `HEIGHT` | Full height of the rectangle. |

Only columns **2–3** receive gradients during `train_placement`; other columns are fixed physical parameters.

### `pin_features` — shape `[P, 7]`

| Index | Name (enum) | Meaning |
|------|----------------|---------|
| 0 | `CELL_IDX` | Index of owning cell in `[0, N)`. |
| 1 | `PIN_X` | Pin offset **x** relative to cell (used in loss). |
| 2 | `PIN_Y` | Pin offset **y** relative to cell (used in loss). |
| 3 | `X` | Absolute **x** at init / legacy column (loss does not rely on staying updated). |
| 4 | `Y` | Absolute **y** at init / legacy column. |
| 5 | `WIDTH` | Pin width (e.g. 0.1). |
| 6 | `HEIGHT` | Pin height. |

**Note:** `wirelength_attraction_loss` uses **cell centers + columns 1–2** (not columns 3–4, which are not kept in sync during training).

### `edge_list` — shape `[E, 2]`, `dtype` long

Each row is `[src_pin_idx, tgt_pin_idx]` into `pin_features`. Undirected connectivity is represented as one row per edge (order may follow generation logic).

---

## Public API reference

### `generate_placement_input(num_macros, num_std_cells)`

| | |
|--|--|
| **Input** | Two nonnegative integers: macro count, standard-cell count. |
| **Output** | `(cell_features, pin_features, edge_list)` as described above. |
| **Utility** | Builds a random synthetic design: macro areas in `[MIN_MACRO_AREA, MAX_MACRO_AREA]`, standard cells from `STANDARD_CELL_AREAS`, random pins per cell, random edges with deduplication. Prints a short summary. |

Implementation mixes vectorized tensor ops with Python loops over cells/pins for pin placement and edge wiring.

---

### `wirelength_attraction_loss(cell_features, pin_features, edge_list)`

| | |
|--|--|
| **Input** | Full feature tensors and edge list. |
| **Output** | Scalar `torch.Tensor`: mean per-edge cost (sum of edge terms divided by `E`). |
| **Utility** | Differentiable **wirelength proxy**: absolute pins = `cell_positions[cell_indices] + pin_features[:, 1:3]`; for each edge, nonnegative `dx, dy` from `abs` differences; **`alpha * logsumexp([dx/α, dy/α])`** on the two axes (a smooth **maximum**-like blend of the separations, not `dx + dy`). `alpha = 0.1`. Returns **0** with `requires_grad=True` if `E == 0`. |

**Vectorization:** Indexing `pin_absolute_*` by `src_pins` and `tgt_pins` avoids Python loops over edges.

---

### `overlap_repulsion_loss(cell_features, pin_features, edge_list, mode="fast")`

| | |
|--|--|
| **Input** | `cell_features`; `pin_features` and `edge_list` are **ignored** (deleted) but kept for a uniform call signature with wirelength loss. |
| **Output** | Scalar differentiable penalty. |
| **Utility** | Penalizes axis-aligned overlap between all **unordered** pairs (upper triangle only). Modes: **`fast`** — sum of overlap areas divided by `(count_overlapping_pairs + 1)`; **`area`** / **`squared`** / **`both`** — mean overlap area and/or mean squared overlap area over `N(N-1)/2` pairs. |

**Vectorization:** `N×N` tensors via `unsqueeze` broadcast for `dx`, `dy`, `min_sep_*`, `relu` penetration, and `torch.triu` mask.

---

### `_fast_overlap_ratio(cell_features)` (private)

| | |
|--|--|
| **Input** | `cell_features` `[N, 6]`. |
| **Output** | Python `float`: fraction of cells that participate in **at least one** overlap. |
| **Utility** | Fast, **vectorized** estimate aligned with the overlap definition used in `calculate_cells_with_overlaps` (strict `<` on separations). Used during training for optional plot annotations. |

---

### `_lr_cosine_anneal(progress, lr_max, lr_min_frac=0.1)` (private)

| | |
|--|--|
| **Input** | `progress` in `[0, 1]`, peak LR, minimum fraction of peak. |
| **Output** | Scalar learning rate. |
| **Utility** | Cosine decay from `lr_max` down to `lr_max * lr_min_frac`. |

---

### `train_placement(cell_features, pin_features, edge_list, ...)`

| | |
|--|--|
| **Input** | Initial features and graph. Defaults (overridable): `num_epochs=10000`, `lr=0.05`, `lambda_wirelength=0.1`, `lambda_overlap=50`, `overlap_loss_mode="fast"`, `verbose=True`, `log_interval=100`, optional `loss_plot_path`, `overlap_ratio_tag_interval=2000`, `per_cell_grad_clip_norm=2.44343` (or `None` to disable clipping). |
| **Output** | `dict` with `final_cell_features`, `initial_cell_features`, `loss_history`, `lambda_wirelength`, `lambda_overlap`, `num_epochs`. |
| **Utility** | Runs **Adam** on a detached `cell_positions` tensor (columns 2–3 only). Each epoch: **`_lr_cosine_anneal(epoch/span, lr)`** sets the optimizer LR (`span = max(num_epochs-1, 1)`). **`λ_wl` and `λ_ol` are fixed** for the whole run (`lambda_wirelength`, `lambda_overlap`). Forward: `total_loss = λ_wl * L_wl + λ_ol * L_ol`; backward; optional **per-cell** L2 grad clip; `optimizer.step()`. `loss_history` records raw/weighted losses, **constant** `scheduled_lambda_wl` / `scheduled_lambda_ol`, `learning_rate`, and optional `overlap_ratio_tags` when `overlap_ratio_tag_interval` is nonzero. Optional loss figure via `plot_training_loss_curves`. |

**Training graph:** `cell_features` is cloned; each epoch builds `cell_features_current` by copying and injecting `cell_positions` into columns 2–3, so the backward path flows into `cell_positions` only.

---

### `calculate_overlap_metrics(cell_features)`

| | |
|--|--|
| **Input** | `cell_features` `[N, 6]`. |
| **Output** | `dict`: `overlap_count` (pair count), `total_overlap_area`, `max_overlap_area`, `overlap_percentage` (implemented as `(overlap_count / N) * 100` when `total_area > 0`). |
| **Utility** | **Ground-truth** reporting using NumPy loops over pairs; **not** differentiable. |

---

### `calculate_cells_with_overlaps(cell_features)`

| | |
|--|--|
| **Input** | `cell_features` `[N, 6]`. |
| **Output** | Python `set` of cell indices that appear in at least one overlapping pair. |
| **Utility** | Defines the **official overlap_ratio** used in tests: `len(set) / N`. |

---

### `calculate_normalized_metrics(cell_features, pin_features, edge_list)`

| | |
|--|--|
| **Input** | Final placement tensors. |
| **Output** | `dict`: `overlap_ratio`, `normalized_wl`, `num_cells_with_overlaps`, `total_cells`, `num_nets`. |
| **Utility** | Single entry point for leaderboard-style metrics: overlap from `calculate_cells_with_overlaps`; wirelength from `wirelength_attraction_loss` × `E` then `(total_wirelength / num_nets) / sqrt(total_area)`. |

---

### `plot_placement(initial_cell_features, final_cell_features, pin_features, edge_list, filename=...)`

| | |
|--|--|
| **Input** | Initial/final cells, pins, edges; output basename. |
| **Output** | Writes PNG under `OUTPUT_DIR` (script directory) unless `filename` is absolute. |
| **Utility** | Side-by-side rectangles + overlap summary text; requires Matplotlib. |

---

### `plot_training_loss_curves(loss_history, lambda_wirelength, lambda_overlap, filename=...)`

| | |
|--|--|
| **Input** | History dict from `train_placement`; lambdas for title. |
| **Output** | Saves a 2×2 figure (log-scaled curves + overlap share + annotations from `overlap_ratio_tags`). |
| **Utility** | Debugging convergence; optional dependency on Matplotlib/NumPy. |

Nested helper `_positive_log_y` masks non-positive/non-finite values for log axes.

---

### `main()`

| | |
|--|--|
| **Input** | None (uses fixed demo sizes and seed). |
| **Output** | Console logs, optional plots, success/fail message. |
| **Utility** | Demonstrates full pipeline: generate → random radial spread → train → metrics → `plot_placement`. |

---

## Runtime and memory notes

### Where work scales as **O(N²)**

- **`overlap_repulsion_loss`:** Builds `N×N` pairwise tensors. Dominates cost for large `N` on GPU/CPU. Fully **vectorized** (no pair loop in Python).
- **`_fast_overlap_ratio`:** Same pairwise structure; **vectorized**; `any` over rows for “cell has any overlap.”
- **`calculate_overlap_metrics` / `calculate_cells_with_overlaps`:** **Double Python loops** over pairs — simpler but slower for large `N`. Evaluation is usually run once per test, not per epoch.

### Where work scales as **O(E)** or **O(P)**

- **`wirelength_attraction_loss`:** O(P) indexing for pin absolutes, O(E) edge reduction. **Vectorized** along edges.

### Training loop overhead

- Each epoch: **`cell_features.clone()`** plus assignment of positions — allocates a full `[N, 6]` tensor every step. For huge `N`, reducing clones (e.g. only swapping in position columns without full tensor duplicate) would be a possible optimization; current code favors clarity and correct autograd wiring into `cell_positions`.

### Differentiable overlap

- **`relu(min_sep - |delta|)`** gives zero gradient when pairs are separated; gradients flow only through overlapping pairs. **`fast`** mode changes magnitude via divide-by-overlap-count (+1), which can affect gradient scaling when few pairs overlap.

### Device

- Code assumes a single default tensor device (typically CPU in the reference harness). For CUDA, ensure all tensors (`mask`, constants) live on the same device as `cell_features` — `overlap_repulsion_loss` already uses `device=x.device` for the upper-triangular mask.

### Optional plotting

- Matplotlib is **lazy-imported** inside plotting functions so headless / CI runs without the dependency until you plot.

---

## Related files

| File | Role |
|------|------|
| [`test.py`](test.py) | Batch runs `TEST_CASES` with `train_placement(..., verbose=False)`; prints per-test metrics and an **aggregate** block (average overlap, average normalized wirelength, total runtime) plus a one-line summary at the end. |
| [`tune_optuna.py`](tune_optuna.py) | Hyperparameter search (external to core library API). |
| [`README.md`](README.md) | Challenge statement and leaderboard. |

---

Welcome aboard — when changing losses or training, keep **tensor shapes**, **centered-rectangle geometry**, and **test metrics** (`calculate_cells_with_overlaps` + normalized wirelength) consistent with this document.
35 changes: 19 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ We will review submissions on a rolling basis.

## Leaderboard (sorted by overlap)

<!-- Oscar Mattia row: ran on macbookair m1 base model, developed on an early morning flight :) -->

| Rank | Name | Overlap | Wirelength (um) | Runtime (s) | Notes |
|------|-----------------|-------------|-----------------|-------------|----------------------|
| 1 | Brayden Rudisill | 0.0000 | 0.2611 | 50.51 | Timed on a mac air |
Expand All @@ -38,23 +40,24 @@ We will review submissions on a rolling basis.
| 5 | William Pan | 0.0000 | 0.2848 | 155.33s | |
| 6 | Ashmit Dutta | 0.0000 | 0.2870 | 995.58 | Spent my entire morning (12 am - 6 am) doing this :P |
| 7 | Pawan Paleja | 0.0000 | 0.3311 | 1.74s | Implemented hint for loss func, cosine annealing on learning rate with warmup, std annealing on lambda weight. Used optuna to tune hyperparam. Tested on gh codespaces 2-core. |
8 | Shashank Shriram | 0.0000 | 0.3312 | 11.32 | 🏎️💥 |
| 8 | Shashank Shriram | 0.0000 | 0.3312 | 11.32 | 🏎️💥 |
| 9 | Gabriel Del Monte | 0.0000 | 0.3427 | 606.07 | |
| 10 | Aleksey Valouev| 0.0000 | 0.3577 | 118.98 | |
| 11 | Mohul Shukla | 0.0000 | 0.5048 | 54.60s | |
| 12 | Ryan Hulke | 0.0000 | 0.5226 | 166.24 | |
| 13 | Neel Shah | 0.0000 | 0.5445 | 45.40 | Zero overlaps on all tests, adaptive schedule + early stop |
| 14 | Nawel Asgar | 0.0000 | 0.5675 | 81.49 | Adaptive penalty scaling with cubic gradients and design-size optimization
| 15 | Shiva Baghel | 0.0000 | 0.5885 | 491.00 | Stable zero-overlap with balanced optimization |
| 16 | Vansh Jain | 0.0000 | 0.9352 | 86.36 | |
| 17 | Akash Pai | 0.0006 | 0.4933 | 326.25s | |
| 18 | Zade Mahayni | 0.00665 | 0.5157 | 127.4 | Will try again tomorrow |
| 19 | Nithin Yanna | 0.0148 | 0.5034 | 247.30s | aggressive overlap penalty with quadratic scaling |
| 20 | Sean Ko | 0.0271 | .5138 | 31.83s | lr increase, decrease epoch, increase lambda overlap and decreased lambda wire_length + log penalty loss |
| 21 | Keya Gohil | 0.0155 | 0.4678 | 1513.07 | Still working |
| 22 | Prithvi Seran | 0.0499 | 0.4890 | 398.58 | |
| 23 | partcl example | 0.8 | 0.4 | 5 | example |
| 24 | Add Yours! | | | | |
| 10 | Aleksey Valouev| 0.0000 | 0.3577 | 118.98 | |
| 11 | Oscar Mattia | 0.0000 | 0.4933 | 580.66 | ran on macbookair m1 base model, developed on an early morning flight :) |
| 12 | Mohul Shukla | 0.0000 | 0.5048 | 54.60s | |
| 13 | Ryan Hulke | 0.0000 | 0.5226 | 166.24 | |
| 14 | Neel Shah | 0.0000 | 0.5445 | 45.40 | Zero overlaps on all tests, adaptive schedule + early stop |
| 15 | Nawel Asgar | 0.0000 | 0.5675 | 81.49 | Adaptive penalty scaling with cubic gradients and design-size optimization |
| 16 | Shiva Baghel | 0.0000 | 0.5885 | 491.00 | Stable zero-overlap with balanced optimization |
| 17 | Vansh Jain | 0.0000 | 0.9352 | 86.36 | |
| 18 | Akash Pai | 0.0006 | 0.4933 | 326.25s | |
| 19 | Zade Mahayni | 0.00665 | 0.5157 | 127.4 | Will try again tomorrow |
| 20 | Nithin Yanna | 0.0148 | 0.5034 | 247.30s | aggressive overlap penalty with quadratic scaling |
| 21 | Sean Ko | 0.0271 | .5138 | 31.83s | lr increase, decrease epoch, increase lambda overlap and decreased lambda wire_length + log penalty loss |
| 22 | Keya Gohil | 0.0155 | 0.4678 | 1513.07 | Still working |
| 23 | Prithvi Seran | 0.0499 | 0.4890 | 398.58 | |
| 24 | partcl example | 0.8 | 0.4 | 5 | example |
| 25 | Add Yours! | | | | |

> **To add your results:**
> Insert a new row in the table above with your name, overlap, wirelength, and any notes. Ensure you sort by overlap.
Expand Down
Loading