API reference

Environment (RS10Env)

RS10Env is a Gymnasium-compatible turn-based environment: on a fixed-size board, each step selects a rectangle whose cells sum to a target (default 10) and clears it, until no valid move remains.

Board and parameters

Board: H × W grid (default 16×10). Each cell is an integer 0–9 (0 = empty, 1–9 = value).
Parameters: H, W, target_sum (default 10), device ("cpu" / "cuda").

Observation space

Type: gymnasium.spaces.Box
Shape: (10, H, W), dtype=np.float32, values in [0, 1].
Meaning: One-hot encoding of the board; channel k indicates whether the cell value is k (0 = empty, 1–9 = digits).

Action space

Type: gymnasium.spaces.Discrete(N) with N = number of valid rectangles.
Meaning: Each action is one axis-aligned rectangle (all cells in [r1..r2] × [c1..c2]), with area ≥ 2.

Valid actions (mask)

An action is valid only if:

The sum of cell values in that rectangle equals target_sum.
Diagonal rule: at least one diagonal pair of the rectangle’s corners has both cells non-zero.

The env provides info["action_mask"] (boolean, shape (N,)) for the current step.

Transition and reward

Transition: Applying action a sets all cells in rect_masks[a] to 0.
Reward (step): (number of cells cleared this step) / (H×W).
Termination bonus: If the episode ends with no valid moves, add (total cells cleared this episode) / (H×W) to the final reward.
terminated: no valid action; truncated: step count ≥ max_steps = (H*W)//2.

API summary

reset(seed=None, options=None, **kwargs)
Use board=... (numpy or torch, shape (H,W)) to set the board, or a random 1–9 board is generated.
Returns (obs, info) with info["step"], info["action_mask"].
step(action, **kwargs)
Returns (obs, reward, terminated, truncated, info) with info["step"], info["total_zeros"], info["action_mask"].

API usage

Single board, multiple strategies

import numpy as np
from rs10env import run_strategies_on_board, STRATEGY_NAMES

board = np.random.randint(1, 10, size=(16, 10), dtype=np.int32)
results = run_strategies_on_board(
    board=board,
    strategy_names=["random", "greedy", "max_future_moves"],
    device="cpu",
    H=16,
    W=10,
    target_sum=10,
)
for r in results:
    mark = " ★ best" if r["is_best"] else ""
    print(r["strategy_name"], r["total_reward"], r["steps"], r["total_cleared"], mark)

Multi-game comparison

from rs10env import run_strategies

def on_progress(current, total, message):
    print(f"[{current}/{total}] {message}")

summary = run_strategies(
    strategy_names=["random", "greedy", "center_bias"],
    num_games=20,
    base_seed=42,
    device="cpu",
    progress_callback=on_progress,
)
for s in summary:
    print(s["strategy_name"], s["avg_cleared"], s["avg_time_sec"], "★ best" if s["is_best"] else "")

Low-level: env + strategy step-by-step

from rs10env import RS10Env, create_strategy

env = RS10Env(device="cpu", H=16, W=10, target_sum=10)
obs, info = env.reset()
mask = info["action_mask"]
strategy = create_strategy("greedy")
action = strategy.get_action(env, mask)
obs, reward, terminated, truncated, info = env.step(action.item())

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API reference

Environment (RS10Env)

Board and parameters

Observation space

Action space

Valid actions (mask)

Transition and reward

API summary

API usage

Single board, multiple strategies

Multi-game comparison

Low-level: env + strategy step-by-step

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API reference

Environment (RS10Env)

Board and parameters

Observation space

Action space

Valid actions (mask)

Transition and reward

API summary

API usage

Single board, multiple strategies

Multi-game comparison

Low-level: env + strategy step-by-step