Skip to content

Aswincloud/ttnn-ops-coverage

Repository files navigation

TTNN Ops Coverage Matrix

An interactive, zero-dependency dashboard for the TTNN (Tenstorrent) operation test matrix — visualizing how every operation behaves across every dtype × layout × memory × broadcast configuration, and how numerically accurate each result is.

Built to deploy as a Cloudflare Workers Static Assets site. The entire front end is hand-rolled HTML/CSS/SVG with no runtime libraries, so it loads instantly and works fully offline.

Live: https://ttnn-ops-coverage.aswincloud.com/

configs ops pass rate pass pcc fail error


What it shows

The source data (eltwise_support_matrix.csv) is produced by eltwise_support_probe.py — a sweep over every op × dtype × layout × memory × broadcast configuration (8 dtypes × 2 layouts × 5 memory configs: interleaved dram/l1 + sharded height/width/block; binary ops additionally tested with scalar/row/col broadcasting). For every config the probe records whether the op ran, whether the output matched a torch golden, the input range fed to it, the PCC vs the golden, and the max per-element ULP error. Each run's raw pcc_or_reason column is classified into a clean status taxonomy:

Status Meaning
🟢 Pass Output matched the golden reference within PCC threshold
🟠 PCC Fail Ran, but numerically inaccurate (PCC below threshold / NaN)
🔴 Hard Error TT_FATAL / TT_THROW — crashed before producing a result
🔵 No Golden Ran, but no reference output to verify against
Skipped Config unsupported / intentionally skipped
Not in TTNN Operation not implemented

PCC thresholds: 0.99 default, 0.97 for bfloat8_b, 0.90 for bfloat4_b; integer dtypes are graded by exact equality (PCC shown for reference only).

Dashboard panels

  • KPI cards — pass rate, hard errors, PCC failures, op total, config total
  • Result-distribution donut — click any slice to solo that status across the table
  • Outcome by axis — stacked pass/fail composition per dtype, layout, memory, and broadcast mode
  • Numerical accuracy (ULP) — distribution of max per-element error in ULP, log-bucketed; toggle between dtypes
  • Top hard-error signatures — the most common device assertions, grouped by source_file:line
  • Coverage snapshot — how configs split between verifiable and unverifiable
  • Operation leaderboard — every op, sortable by any column, searchable. Click a row to expand a dtype × layout·mem heatmap; for binary/ternary ops an inline broadcast-mode toggle (none / scalar / row / col) right in the matrix header switches which mode the grid shows — it swaps in place so you never scroll away; hover any cell for the exact status, broadcast mode, input range, PCC, ULP, and failure reason.

Run-to-run comparison

A Changes button diffs the current matrix against the previous dated probe snapshot (history/eltwise_support_matrix_YYYY-MM-DD.csv) — surfacing which configs newly pass, regressed, were added/removed, or had a meaningful PCC/ULP shift. The diff is computed at build time in process.py; until two dated snapshots exist it honestly shows "no baseline snapshot yet". See PROBE.md for the --dated workflow.

Suggest / feedback

A Suggest button opens a modal that POSTs to /api/feedback (handled by the Worker → Resend email), for reporting a result mismatch — e.g. an op marked failed that actually works.

Keyboard: / focuses search · Esc clears search/solo or closes a modal.


Project layout

.
├── public/                 # ← static assets served by the Worker
│   ├── index.html          #   markup + design system (CSS)
│   ├── app.js              #   chart/table renderer (no deps)
│   └── data.js             #   generated — window.DASH payload (gitignored)
├── worker/index.js         # serves assets + POST /api/feedback → Resend
├── eltwise_support_matrix.csv # source data (regenerate data.js from this)
├── process.py              # CSV → public/data.js transformer + classifier + run diff
├── eltwise_support_probe.py # the probe that GENERATES the matrix CSV (see PROBE.md)
├── history/                # dated probe snapshots (--dated); power the "Changes" diff
├── PROBE.md                # how the probe sweep works / how to run it
├── scripts/                # CI validators (check_data.py, check_code.mjs)
├── .github/workflows/ci.yml # data + code + lint gates on every push/PR
├── eslint.config.js        # flat ESLint config for the shipped JS
├── wrangler.jsonc          # Cloudflare Workers config (assets + feedback API)
└── package.json

public/data.js is a build artifact — it is not committed (gitignored). CI and Cloudflare regenerate it from eltwise_support_matrix.csv on every deploy (and you regenerate it locally with python3 process.py). eltwise_support_matrix.csv is the single source of truth — it's the probe's native output, overwritten by each daily run (which also drops a dated copy in history/).


Continuous integration

Every push and PR runs .github/workflows/ci.yml — three gates that catch what Cloudflare won't (a CSV that parses but yields wrong totals, malformed columns, broken JS, an un-reconciling or accidentally-committed data.js):

Job Checks
data integrity eltwise_support_matrix.csv shape + columns, PCC numeric-or-empty, process.py rebuilds and statusCounts sum == meta.total == row count
code checks node --check on app.js + worker, boots data.js in a sandbox and reconciles, asserts data.js is not git-tracked
lint ESLint over the shipped JS

Run them all locally with npm run check.


Local development

# Regenerate the dashboard data from the CSV
python3 process.py            # → writes public/data.js

# Serve locally (any static server works)
npm run serve                 # http://localhost:8080  (binds 0.0.0.0)
#   or
npx wrangler dev              # http://localhost:8787  (emulates the edge)

Updating the data is just: replace eltwise_support_matrix.csv → run python3 process.py → refresh. No rebuild step for HTML/JS.


Deploy to Cloudflare Workers

This repo is configured for Workers Static Assets (no Worker script — Cloudflare serves public/ directly from the edge).

npm install          # pulls wrangler
npx wrangler login   # one-time, opens browser
npm run deploy       # = python3 process.py && wrangler deploy

After the first deploy it's live at:

https://ttnn-ops-coverage.<your-subdomain>.workers.dev

Custom domain: https://ttnn-ops-coverage.aswincloud.com/

Auto-deploy on push

This repo is connected to Cloudflare Workers Builds (CF-native Git integration — configured in the Cloudflare dashboard, nothing required in the repo itself). On every push to main, Cloudflare runs:

Step Command
Build npm run buildpython3 process.py rebuilds public/data.js from eltwise_support_matrix.csv
Deploy npx wrangler deploy → ships public/ to the edge

So updating the dashboard is just: replace eltwise_support_matrix.csv, commit, push → live in ~a minute. Because the build regenerates data.js, eltwise_support_matrix.csv is the only file you ever need to touch — and the daily updater does exactly that.


Regenerating data.js

process.py does all the heavy lifting:

  1. Parses eltwise_support_matrix.csv (RFC-correct CSV — failure reasons contain embedded commas/newlines from C++ backtraces).
  2. Classifies each row's accepted + pcc_or_reason into the 6-status taxonomy.
  3. Collapses verbose TT_FATAL/TT_THROW backtraces into KIND file:line — message signatures and groups them.
  4. Computes per-op, per-dtype, per-layout, per-memory aggregations and the ULP distribution.
  5. Diffs against the previous dated snapshot in history/ to build the "Changes" payload.
  6. Emits a compact window.DASH = {…} payload (interned strings + integer-indexed rows) to public/data.js.

Totals always reconcile to the row count — the script prints a status breakdown on every run.

About

Interactive zero-dependency dashboard for the TTNN (Tenstorrent) op test matrix — every op across dtype × layout × memory, with per-config PCC/ULP accuracy and run-to-run diffs. Deploys to Cloudflare Workers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors