Motivation
SPARSE MATRIX COMPRESSION (P161) from Garey & Johnson, A4 SR13. A classical NP-complete problem arising in the compact storage of sparse matrices. The idea is to overlay (superimpose) the rows of a binary matrix into a one-dimensional storage vector by assigning each row a shift offset, such that non-zero entries from different rows do not collide. This technique is used in practice for storing sparse DFA transition tables and parser tables (Ziegler's method). The problem asks whether the total storage vector length can be bounded by n + K, where n is the number of columns and K is the maximum shift range. Even, Lichtenstein, and Shiloach (1977) proved NP-completeness via reduction from Graph 3-Colorability, and the problem remains NP-complete even for fixed K = 3.
Associated rules:
- R107: Graph 3-Colorability -> Sparse Matrix Compression (this model is the target)
Definition
Name: SparseMatrixCompression
Canonical name: SPARSE MATRIX COMPRESSION
Reference: Garey & Johnson, Computers and Intractability, A4 SR13, p.229
Mathematical definition:
INSTANCE: An m x n matrix A with entries a_{ij} in {0,1}, 1 <= i <= m, 1 <= j <= n, and a positive integer K <= mn.
QUESTION: Is there a sequence (b_1, b_2, ..., b_{n+K}) of integers b_i, each satisfying 0 <= b_i <= m, and a function s: {1,2,...,m} -> {1,2,...,K} such that, for 1 <= i <= m and 1 <= j <= n, the entry a_{ij} = 1 if and only if b_{s(i)+j-1} = i?
Variables
- Count: m + (n + K) variables total. The shift function s assigns each of the m rows a shift value in {1,...,K} (m variables with domain K). The storage vector b has n + K entries (n + K variables with domain {0,...,m}).
- Per-variable domain: s(i) in {1, 2, ..., K} for each row i; b_j in {0, 1, ..., m} for each storage position j.
- Meaning: s(i) is the offset at which row i is placed in the storage vector. b_j identifies which row (if any) "owns" storage position j. The constraint b_{s(i)+j-1} = i for a_{ij}=1 means each non-zero entry of row i appears at the correct offset position in the storage vector, and no two rows' non-zero entries collide at the same position.
Schema (data type)
Type name: SparseMatrixCompression
Variants: none (no graph or weight parameters)
| Field |
Type |
Description |
matrix |
Vec<Vec<bool>> |
The m x n binary matrix A |
num_rows |
usize |
Number of rows m |
num_cols |
usize |
Number of columns n |
bound_k |
usize |
Maximum shift range K (storage vector length = n + K) |
Notes:
- This is a satisfaction (decision) problem:
Metric = bool, implementing SatisfactionProblem.
- No weight type is needed.
- The storage vector length is n + K (fixed given the instance).
- Alternative encoding: store the matrix in CSR (compressed sparse row) format for efficiency.
Complexity
- Best known exact algorithm: Brute-force: enumerate all K^m possible shift assignments s and check validity. For each assignment, construct the storage vector in O(m * n) and verify no conflicts. Total: O(K^m * m * n). For fixed K=3, this is O(3^m * m * n).
- Complexity string:
"(bound_k ^ num_rows) * num_rows * num_cols"
- Approximation: Ziegler's greedy heuristic (place rows one by one, choosing the smallest valid shift) is widely used in practice. Jugé et al. (2026) showed the greedy algorithm has approximation ratio Theta(sqrt(m)) where m is the optimal solution value (Theorem 3.1).
- NP-completeness: NP-complete [Even, Lichtenstein, and Shiloach, 1977], via transformation from Graph 3-Colorability. Remains NP-complete for fixed K = 3.
- References:
- S. Even, D. I. Lichtenstein, and Y. Shiloach (1977). "Remarks on Ziegler's method for matrix compression." Unpublished manuscript.
- V. Jugé, D. Köppl, V. Limouzy, A. Marino, J. Olbrich, G. Punzi, and T. Uno (2026). "Revisiting the Sparse Matrix Compression Problem." arXiv:2602.15314.
Extra Remark
Full book text:
INSTANCE: An m x n matrix A with entries aij in {0,1}, 1 <= i <= m, 1 <= j <= n, and a positive integer K <= mn.
QUESTION: Is there a sequence (b1,b2,...,bn+K) of integers bi, each satisfying 0 <= bi <= m, and a function s: {1,2,...,m} -> {1,2,...,K} such that, for 1 <= i <= m and 1 <= j <= n, the entry aij = 1 if and only if bs(i)+j-1 = i?
Reference: [Even, Lichtenstein, and Shiloach, 1977]. Transformation from GRAPH 3-COLORABILITY.
Comment: Remains NP-complete for fixed K = 3.
Practical context: Sparse matrix compression via row overlaying (Ziegler's method) is used to store DFA transition tables compactly. The rows of the transition matrix are overlaid into a single vector, with each row shifted by some offset. Aho, Sethi, and Ullman (the "Dragon Book") recommend this technique for lexer and parser table compression. The NP-completeness result means that finding the optimal compression is intractable, justifying the use of greedy heuristics.
How to solve
Example Instance
Instance (4x4 matrix, K=2):
Matrix A (4 x 4):
| Row |
c1 |
c2 |
c3 |
c4 |
| r1 |
1 |
0 |
0 |
1 |
| r2 |
0 |
1 |
0 |
0 |
| r3 |
0 |
0 |
1 |
0 |
| r4 |
1 |
0 |
0 |
0 |
K = 2, so the storage vector has length n + K = 4 + 2 = 6.
Each row's non-zero column set: r1 = {c1, c4}, r2 = {c2}, r3 = {c3}, r4 = {c1}.
Shift assignment: s(r1) = 2, s(r2) = 2, s(r3) = 2, s(r4) = 1.
Verification: For each row i, the "shifted support" is the set of storage positions {s(i) + j - 1 : a_{ij} = 1}:
- r1 (s=2): positions {2+1-1, 2+4-1} = {2, 5}
- r2 (s=2): positions {2+2-1} = {3}
- r3 (s=2): positions {2+3-1} = {4}
- r4 (s=1): positions {1+1-1} = {1}
All shifted supports are pairwise disjoint: {1}, {2, 5}, {3}, {4}. No conflicts.
Storage vector: b = (b_1, b_2, b_3, b_4, b_5, b_6) = (4, 1, 2, 3, 1, 0).
Biconditional check (a_{ij} = 1 iff b_{s(i)+j-1} = i):
- a_{1,1}=1: b_{2} = 1 = r1 ✓ | a_{1,2}=0: b_{3} = 2 ≠ 1 ✓ | a_{1,3}=0: b_{4} = 3 ≠ 1 ✓ | a_{1,4}=1: b_{5} = 1 = r1 ✓
- a_{2,1}=0: b_{2} = 1 ≠ 2 ✓ | a_{2,2}=1: b_{3} = 2 = r2 ✓ | a_{2,3}=0: b_{4} = 3 ≠ 2 ✓ | a_{2,4}=0: b_{5} = 1 ≠ 2 ✓
- a_{3,1}=0: b_{2} = 1 ≠ 3 ✓ | a_{3,2}=0: b_{3} = 2 ≠ 3 ✓ | a_{3,3}=1: b_{4} = 3 = r3 ✓ | a_{3,4}=0: b_{5} = 1 ≠ 3 ✓
- a_{4,1}=1: b_{1} = 4 = r4 ✓ | a_{4,2}=0: b_{2} = 1 ≠ 4 ✓ | a_{4,3}=0: b_{3} = 2 ≠ 4 ✓ | a_{4,4}=0: b_{4} = 3 ≠ 4 ✓
All 16 entries satisfy the biconditional. Answer: YES.
Uniqueness: This is the only valid shift assignment out of all 2^4 = 16 possible assignments. For example, s = (1, 1, 1, 1) fails because r1 and r4 both map to position 1 (b_1 would need to be both 1 and 4). The 15 failing assignments provide strong test coverage for correctness.
Expected Outcome
This is a satisfaction problem. The instance above has exactly one satisfying assignment:
- Satisfying: s = (2, 2, 2, 1), with storage vector b = (4, 1, 2, 3, 1, 0).
- Non-satisfying examples: s = (1, 1, 1, 1) fails (r1 and r4 collide at position 1); s = (1, 2, 2, 2) fails (r1 and r4 collide at position 1 via different columns); s = (2, 2, 2, 2) fails (r1 and r4 collide at position 2).
Motivation
SPARSE MATRIX COMPRESSION (P161) from Garey & Johnson, A4 SR13. A classical NP-complete problem arising in the compact storage of sparse matrices. The idea is to overlay (superimpose) the rows of a binary matrix into a one-dimensional storage vector by assigning each row a shift offset, such that non-zero entries from different rows do not collide. This technique is used in practice for storing sparse DFA transition tables and parser tables (Ziegler's method). The problem asks whether the total storage vector length can be bounded by n + K, where n is the number of columns and K is the maximum shift range. Even, Lichtenstein, and Shiloach (1977) proved NP-completeness via reduction from Graph 3-Colorability, and the problem remains NP-complete even for fixed K = 3.
Associated rules:
Definition
Name:
SparseMatrixCompressionCanonical name: SPARSE MATRIX COMPRESSION
Reference: Garey & Johnson, Computers and Intractability, A4 SR13, p.229
Mathematical definition:
INSTANCE: An m x n matrix A with entries a_{ij} in {0,1}, 1 <= i <= m, 1 <= j <= n, and a positive integer K <= mn.
QUESTION: Is there a sequence (b_1, b_2, ..., b_{n+K}) of integers b_i, each satisfying 0 <= b_i <= m, and a function s: {1,2,...,m} -> {1,2,...,K} such that, for 1 <= i <= m and 1 <= j <= n, the entry a_{ij} = 1 if and only if b_{s(i)+j-1} = i?
Variables
Schema (data type)
Type name:
SparseMatrixCompressionVariants: none (no graph or weight parameters)
matrixVec<Vec<bool>>num_rowsusizenum_colsusizebound_kusizeNotes:
Metric = bool, implementingSatisfactionProblem.Complexity
"(bound_k ^ num_rows) * num_rows * num_cols"Extra Remark
Full book text:
INSTANCE: An m x n matrix A with entries aij in {0,1}, 1 <= i <= m, 1 <= j <= n, and a positive integer K <= mn.
QUESTION: Is there a sequence (b1,b2,...,bn+K) of integers bi, each satisfying 0 <= bi <= m, and a function s: {1,2,...,m} -> {1,2,...,K} such that, for 1 <= i <= m and 1 <= j <= n, the entry aij = 1 if and only if bs(i)+j-1 = i?
Reference: [Even, Lichtenstein, and Shiloach, 1977]. Transformation from GRAPH 3-COLORABILITY.
Comment: Remains NP-complete for fixed K = 3.
Practical context: Sparse matrix compression via row overlaying (Ziegler's method) is used to store DFA transition tables compactly. The rows of the transition matrix are overlaid into a single vector, with each row shifted by some offset. Aho, Sethi, and Ullman (the "Dragon Book") recommend this technique for lexer and parser table compression. The NP-completeness result means that finding the optimal compression is intractable, justifying the use of greedy heuristics.
How to solve
Example Instance
Instance (4x4 matrix, K=2):
Matrix A (4 x 4):
K = 2, so the storage vector has length n + K = 4 + 2 = 6.
Each row's non-zero column set: r1 = {c1, c4}, r2 = {c2}, r3 = {c3}, r4 = {c1}.
Shift assignment: s(r1) = 2, s(r2) = 2, s(r3) = 2, s(r4) = 1.
Verification: For each row i, the "shifted support" is the set of storage positions {s(i) + j - 1 : a_{ij} = 1}:
All shifted supports are pairwise disjoint: {1}, {2, 5}, {3}, {4}. No conflicts.
Storage vector: b = (b_1, b_2, b_3, b_4, b_5, b_6) = (4, 1, 2, 3, 1, 0).
Biconditional check (a_{ij} = 1 iff b_{s(i)+j-1} = i):
All 16 entries satisfy the biconditional. Answer: YES.
Uniqueness: This is the only valid shift assignment out of all 2^4 = 16 possible assignments. For example, s = (1, 1, 1, 1) fails because r1 and r4 both map to position 1 (b_1 would need to be both 1 and 4). The 15 failing assignments provide strong test coverage for correctness.
Expected Outcome
This is a satisfaction problem. The instance above has exactly one satisfying assignment: