[Model] SparseMatrixCompression

## Motivation

SPARSE MATRIX COMPRESSION (P161) from Garey & Johnson, A4 SR13. A classical NP-complete problem arising in the compact storage of sparse matrices. The idea is to overlay (superimpose) the rows of a binary matrix into a one-dimensional storage vector by assigning each row a shift offset, such that non-zero entries from different rows do not collide. This technique is used in practice for storing sparse DFA transition tables and parser tables (Ziegler's method). The problem asks whether the total storage vector length can be bounded by n + K, where n is the number of columns and K is the maximum shift range. Even, Lichtenstein, and Shiloach (1977) proved NP-completeness via reduction from Graph 3-Colorability, and the problem remains NP-complete even for fixed K = 3.

**Associated rules:**
- R107: Graph 3-Colorability -> Sparse Matrix Compression (this model is the target)

## Definition

**Name:** `SparseMatrixCompression`
**Canonical name:** SPARSE MATRIX COMPRESSION
**Reference:** Garey & Johnson, *Computers and Intractability*, A4 SR13, p.229

**Mathematical definition:**

INSTANCE: An m x n matrix A with entries a_{ij} in {0,1}, 1 <= i <= m, 1 <= j <= n, and a positive integer K <= mn.
QUESTION: Is there a sequence (b_1, b_2, ..., b_{n+K}) of integers b_i, each satisfying 0 <= b_i <= m, and a function s: {1,2,...,m} -> {1,2,...,K} such that, for 1 <= i <= m and 1 <= j <= n, the entry a_{ij} = 1 if and only if b_{s(i)+j-1} = i?

## Variables


- **Count:** m + (n + K) variables total. The shift function s assigns each of the m rows a shift value in {1,...,K} (m variables with domain K). The storage vector b has n + K entries (n + K variables with domain {0,...,m}).
- **Per-variable domain:** s(i) in {1, 2, ..., K} for each row i; b_j in {0, 1, ..., m} for each storage position j.
- **Meaning:** s(i) is the offset at which row i is placed in the storage vector. b_j identifies which row (if any) "owns" storage position j. The constraint b_{s(i)+j-1} = i for a_{ij}=1 means each non-zero entry of row i appears at the correct offset position in the storage vector, and no two rows' non-zero entries collide at the same position.

## Schema (data type)


**Type name:** `SparseMatrixCompression`
**Variants:** none (no graph or weight parameters)

| Field | Type | Description |
|-------|------|-------------|
| `matrix` | `Vec<Vec<bool>>` | The m x n binary matrix A |
| `num_rows` | `usize` | Number of rows m |
| `num_cols` | `usize` | Number of columns n |
| `bound_k` | `usize` | Maximum shift range K (storage vector length = n + K) |

**Notes:**
- This is a satisfaction (decision) problem: `Metric = bool`, implementing `SatisfactionProblem`.
- No weight type is needed.
- The storage vector length is n + K (fixed given the instance).
- Alternative encoding: store the matrix in CSR (compressed sparse row) format for efficiency.

## Complexity


- **Best known exact algorithm:** Brute-force: enumerate all K^m possible shift assignments s and check validity. For each assignment, construct the storage vector in O(m * n) and verify no conflicts. Total: O(K^m * m * n). For fixed K=3, this is O(3^m * m * n).
- **Complexity string:** `"(bound_k ^ num_rows) * num_rows * num_cols"`
- **Approximation:** Ziegler's greedy heuristic (place rows one by one, choosing the smallest valid shift) is widely used in practice. Jugé et al. (2026) showed the greedy algorithm has approximation ratio Theta(sqrt(m)) where m is the optimal solution value (Theorem 3.1).
- **NP-completeness:** NP-complete [Even, Lichtenstein, and Shiloach, 1977], via transformation from Graph 3-Colorability. Remains NP-complete for fixed K = 3.
- **References:**
  - S. Even, D. I. Lichtenstein, and Y. Shiloach (1977). "Remarks on Ziegler's method for matrix compression." Unpublished manuscript.
  - V. Jugé, D. Köppl, V. Limouzy, A. Marino, J. Olbrich, G. Punzi, and T. Uno (2026). "Revisiting the Sparse Matrix Compression Problem." arXiv:2602.15314.

## Extra Remark

**Full book text:**

INSTANCE: An m x n matrix A with entries aij in {0,1}, 1 <= i <= m, 1 <= j <= n, and a positive integer K <= mn.
QUESTION: Is there a sequence (b1,b2,...,bn+K) of integers bi, each satisfying 0 <= bi <= m, and a function s: {1,2,...,m} -> {1,2,...,K} such that, for 1 <= i <= m and 1 <= j <= n, the entry aij = 1 if and only if bs(i)+j-1 = i?
Reference: [Even, Lichtenstein, and Shiloach, 1977]. Transformation from GRAPH 3-COLORABILITY.
Comment: Remains NP-complete for fixed K = 3.

**Practical context:** Sparse matrix compression via row overlaying (Ziegler's method) is used to store DFA transition tables compactly. The rows of the transition matrix are overlaid into a single vector, with each row shifted by some offset. Aho, Sethi, and Ullman (the "Dragon Book") recommend this technique for lexer and parser table compression. The NP-completeness result means that finding the optimal compression is intractable, justifying the use of greedy heuristics.

## How to solve

- [x] It can be solved by (existing) bruteforce -- enumerate all K^m shift assignments and check for valid overlay.
- [x] It can be solved by reducing to integer programming -- encode shift assignments as integer variables s_i in {1,...,K}, storage vector as integer variables b_j in {0,...,m}, add constraints b_{s_i + j - 1} = i for each a_{ij} = 1 (linearize using big-M or indicator constraints).
- [x] Other: Greedy heuristic (Ziegler's method): process rows in decreasing order of density, assign the smallest shift that avoids conflicts. Also reducible to graph coloring: construct a conflict graph where rows are vertices, edges connect rows that would conflict at some shift difference, and K-color this graph.

## Example Instance

**Instance (4x4 matrix, K=2):**

Matrix A (4 x 4):

| Row | c1 | c2 | c3 | c4 |
|-----|----|----|----|----|
| r1  |  1 |  0 |  0 |  1 |
| r2  |  0 |  1 |  0 |  0 |
| r3  |  0 |  0 |  1 |  0 |
| r4  |  1 |  0 |  0 |  0 |

K = 2, so the storage vector has length n + K = 4 + 2 = 6.

Each row's non-zero column set: r1 = {c1, c4}, r2 = {c2}, r3 = {c3}, r4 = {c1}.

**Shift assignment:** s(r1) = 2, s(r2) = 2, s(r3) = 2, s(r4) = 1.

**Verification:** For each row i, the "shifted support" is the set of storage positions {s(i) + j - 1 : a_{ij} = 1}:
- r1 (s=2): positions {2+1-1, 2+4-1} = {2, 5}
- r2 (s=2): positions {2+2-1} = {3}
- r3 (s=2): positions {2+3-1} = {4}
- r4 (s=1): positions {1+1-1} = {1}

All shifted supports are pairwise disjoint: {1}, {2, 5}, {3}, {4}. No conflicts.

**Storage vector:** b = (b_1, b_2, b_3, b_4, b_5, b_6) = (4, 1, 2, 3, 1, 0).

**Biconditional check** (a_{ij} = 1 iff b_{s(i)+j-1} = i):
- a_{1,1}=1: b_{2} = 1 = r1 ✓ | a_{1,2}=0: b_{3} = 2 ≠ 1 ✓ | a_{1,3}=0: b_{4} = 3 ≠ 1 ✓ | a_{1,4}=1: b_{5} = 1 = r1 ✓
- a_{2,1}=0: b_{2} = 1 ≠ 2 ✓ | a_{2,2}=1: b_{3} = 2 = r2 ✓ | a_{2,3}=0: b_{4} = 3 ≠ 2 ✓ | a_{2,4}=0: b_{5} = 1 ≠ 2 ✓
- a_{3,1}=0: b_{2} = 1 ≠ 3 ✓ | a_{3,2}=0: b_{3} = 2 ≠ 3 ✓ | a_{3,3}=1: b_{4} = 3 = r3 ✓ | a_{3,4}=0: b_{5} = 1 ≠ 3 ✓
- a_{4,1}=1: b_{1} = 4 = r4 ✓ | a_{4,2}=0: b_{2} = 1 ≠ 4 ✓ | a_{4,3}=0: b_{3} = 2 ≠ 4 ✓ | a_{4,4}=0: b_{4} = 3 ≠ 4 ✓

All 16 entries satisfy the biconditional. **Answer: YES.**

**Uniqueness:** This is the only valid shift assignment out of all 2^4 = 16 possible assignments. For example, s = (1, 1, 1, 1) fails because r1 and r4 both map to position 1 (b_1 would need to be both 1 and 4). The 15 failing assignments provide strong test coverage for correctness.

## Expected Outcome

This is a satisfaction problem. The instance above has exactly one satisfying assignment:
- **Satisfying:** s = (2, 2, 2, 1), with storage vector b = (4, 1, 2, 3, 1, 0).
- **Non-satisfying examples:** s = (1, 1, 1, 1) fails (r1 and r4 collide at position 1); s = (1, 2, 2, 2) fails (r1 and r4 collide at position 1 via different columns); s = (2, 2, 2, 2) fails (r1 and r4 collide at position 2).

Field	Type	Description
`matrix`	`Vec<Vec<bool>>`	The m x n binary matrix A
`num_rows`	`usize`	Number of rows m
`num_cols`	`usize`	Number of columns n
`bound_k`	`usize`	Maximum shift range K (storage vector length = n + K)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] SparseMatrixCompression #416

Motivation

Definition

Variables

Schema (data type)

Complexity

Extra Remark

How to solve

Example Instance

Expected Outcome

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Model] SparseMatrixCompression #416

Description

Motivation

Definition

Variables

Schema (data type)

Complexity

Extra Remark

How to solve

Example Instance

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions