Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/open-api-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ openapi: 3.0.3
info:
title: The Agent's user-facing API
description: The user-facing parts of The Agent's API service (excluding system-level endpoints, chat completion, maintenance endpoints, etc.)
version: 5.8.0
version: 5.8.1
license:
name: MIT
url: https://opensource.org/licenses/MIT
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-04-12
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## Context

`resize_file` in `src/features/images/image_size_utils.py` is called from `platform_bot_sdk.py` when a downloaded image exceeds the platform's size limit. The current implementation uses a linear walk-down: first reducing JPEG quality (95→85 in steps of 5), then reducing scale factor (0.9→0.1 in steps of 0.02). Each iteration performs a full PIL resize + encode. For a 20 MB image targeting 5 MB, this can take 15-20 iterations.

The function's public contract is simple: `(input_path, max_size_bytes) → output_path`. No callers depend on internal behavior — only that the returned file is under the limit.

## Goals / Non-Goals

**Goals:**

- Reduce iteration count from ~20 to ~7 for typical large-photo resizing
- Land output size in a predictable target band (90-100% of limit)
- Maintain the same public API (`resize_file(input_path, max_size_bytes) → str`)
- Add test coverage for `resize_file` and other untested functions in the module

**Non-Goals:**

- Changing the image format handling (PNG/JPEG/WEBP support stays the same)
- Optimizing PIL operations themselves (resize + encode are the fixed cost per iteration)
- Supporting additional output formats
- Changing the caller in `platform_bot_sdk.py`

## Decisions

### Binary search on scale factor with informed initial guess

The core change. Instead of decrementing scale by a fixed step, binary search between `lo=0.1` and `hi=1.0`. Start with `guess = sqrt(max_size_bytes / original_size)` clamped to `[0.1, 1.0]`.

**Why sqrt**: File size scales roughly with pixel area (width × height), and area scales with the square of the scale factor. So `sqrt(target_ratio)` gives a good first approximation.

**Alternative considered**: Linear search with adaptive step size. Rejected because binary search is simpler, converges faster, and doesn't require tuning step parameters.

### Fixed quality at 90 for lossy formats

JPEG/WEBP quality fixed at 90 instead of iterating from 95→85. Quality 90 is visually near-lossless while providing significant compression. This eliminates the quality iteration phase entirely and keeps dimensions as large as possible.

**Alternative considered**: Binary search on both quality and scale simultaneously. Rejected because it adds complexity with minimal benefit — quality 90 is the right tradeoff for chat-context images, and a single fixed value keeps the algorithm one-dimensional.

### Target band: 90-100% of max_size_bytes

Accept results where `max_size_bytes * 0.90 <= output_size <= max_size_bytes`. This gives a 10% wide band — wide enough for binary search to land in within ~7 iterations, tight enough to not waste space.

**Previous behavior**: Accepted anything under limit, but preferred within 3%. The new band is wider on the low end (10% vs 3%) which allows faster convergence.

### MAX_ITERATIONS = 10, MIN_DIMENSION = 64

Binary search on `[0.1, 1.0]` converges to 1% precision in `log2(0.9/0.01) ≈ 6.6` iterations. Cap at 10 as a safety net — should never trigger in practice. Minimum dimension raised from 32px to 64px since anything smaller is useless in a chat context.

### Best-effort fallback strategy

Track the best under-limit result seen during the search. If the loop exits without landing in the target band (due to iteration cap or minimum dimension), return the best under-limit result. If nothing was under the limit, return the closest result overall. Only raise `ValidationError` if no result was produced at all.

## Risks / Trade-offs

- **PNG compression unpredictability**: PNG file size depends heavily on image content (gradients compress well, noise doesn't). Binary search may take more iterations for PNG than JPEG. → Mitigation: the 10% wide target band and 10-iteration cap handle this; worst case returns best effort.
- **Quality 90 may over-compress for some use cases**: Fixed quality means no adaptation to image content. → Mitigation: 90 is the standard web-quality sweet spot; for chat-context images this is more than sufficient.
- **Informed guess assumes quadratic relationship**: The `sqrt` estimate is approximate — actual compression ratios vary by content. → Mitigation: it's just the starting point; binary search corrects from there regardless of guess accuracy.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## Why

The current image resizing algorithm in `image_size_utils.py` uses a linear walk-down approach: it decreases JPEG quality in fixed steps (95 to 85), then decreases scale factor by 0.02 per iteration. For large photos needing significant size reduction, this burns 15-20+ iterations — each doing a full resize + encode cycle. A binary search approach converges logarithmically, reducing this to ~7 iterations.

## What Changes

- Replace the linear iterative resizing loop in `resize_file` with a binary search on scale factor
- Use an informed initial guess (`sqrt(target / original_size)`) to start near the right answer
- Fix compression quality at 90 for lossy formats (JPEG/WEBP) instead of iterating quality
- Target band of 90-100% of `max_size_bytes` (previously accepted anything under limit, preferring within 3%)
- Increase minimum dimension guard from 32px to 64px
- Reduce `MAX_ITERATIONS` from 30 to 10 (binary search converges in ~7)
- Add comprehensive test coverage for `resize_file` and other untested functions in the module

## Capabilities

### New Capabilities

- `binary-search-resize`: Binary search image resizing algorithm with informed initial guess, fixed quality, and target band convergence

### Modified Capabilities

## Impact

- `src/features/images/image_size_utils.py` — rewrite of `resize_file` function (public API unchanged)
- `test/features/images/test_image_size_utils.py` — new test file
- No API changes, no dependency changes, no breaking changes
- Callers (`platform_bot_sdk.py`) unaffected — same function signature, same return type
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## ADDED Requirements

### Requirement: Binary search convergence on scale factor
The system SHALL use binary search on scale factor to find an output size within the target band. The initial guess SHALL be computed as `sqrt(max_size_bytes / original_file_size)`, clamped to `[0.1, 1.0]`. The search space SHALL be `lo=0.1`, `hi=1.0`.

#### Scenario: Large JPEG resized into target band
- **WHEN** a 2000x2000 JPEG image exceeding the size limit is resized with a given max_size_bytes
- **THEN** the output file size SHALL be between 90% and 100% of max_size_bytes

#### Scenario: Large PNG resized into target band
- **WHEN** a 2000x2000 PNG image exceeding the size limit is resized with a given max_size_bytes
- **THEN** the output file size SHALL be between 90% and 100% of max_size_bytes

#### Scenario: Large WEBP resized into target band
- **WHEN** a 2000x2000 WEBP image exceeding the size limit is resized with a given max_size_bytes
- **THEN** the output file size SHALL be between 90% and 100% of max_size_bytes

### Requirement: Fixed compression quality for lossy formats
The system SHALL use a fixed quality of 90 for JPEG and WEBP encoding. PNG encoding SHALL NOT use a quality parameter (lossless).

#### Scenario: JPEG encoded at quality 90
- **WHEN** a JPEG image is resized
- **THEN** the output SHALL be encoded with quality=90 and optimize=True

#### Scenario: PNG encoded without quality parameter
- **WHEN** a PNG image is resized
- **THEN** the output SHALL be encoded with optimize=True and no quality parameter

### Requirement: Early return for under-limit files
The system SHALL return the original file path unchanged when the original file size is already within the size limit.

#### Scenario: Small file returns original path
- **WHEN** resize_file is called with an image whose file size is already under max_size_bytes
- **THEN** the original input_path SHALL be returned without re-encoding

### Requirement: Minimum dimension guard
The system SHALL stop searching and return best effort when either dimension of the scaled image would fall below 64 pixels.

#### Scenario: Image hits minimum dimension during search
- **WHEN** binary search reaches a scale factor where either width or height would be below 64px
- **THEN** the system SHALL stop the search and return the best under-limit result seen so far, or the closest result overall

### Requirement: Iteration safety cap
The system SHALL stop after a maximum of 10 iterations and return the best result available.

#### Scenario: Safety cap reached
- **WHEN** the binary search has not converged after 10 iterations
- **THEN** the system SHALL return the best under-limit result, or the closest result overall, or raise a ValidationError if no result was produced

### Requirement: Best-effort fallback
The system SHALL track the best under-limit result and the best overall result during the search. When the search ends without landing in the target band, the system SHALL prefer the best under-limit result, then the best overall result, and only raise ValidationError as a last resort.

#### Scenario: No result in target band but under-limit result exists
- **WHEN** the search ends and no result landed in the 90-100% band but a result under the limit was found
- **THEN** the system SHALL return the best under-limit result

#### Scenario: No under-limit result exists
- **WHEN** the search ends and no result was under the limit
- **THEN** the system SHALL return the smallest result seen overall

#### Scenario: No result produced at all
- **WHEN** the search ends with no results (e.g., image too small to encode)
- **THEN** the system SHALL raise a ValidationError with error code INVALID_IMAGE_SIZE
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## 1. Rewrite resize_file

- [x] 1.1 Replace constants: `MAX_ITERATIONS=10`, `QUALITY=90`, `MIN_DIMENSION=64`, `TARGET_RATIO_LO=0.90`
- [x] 1.2 Implement informed initial guess: `scale = sqrt(max_size_bytes / original_size)` clamped to `[0.1, 1.0]`
- [x] 1.3 Implement binary search loop on scale factor with `lo=0.1`, `hi=1.0`, converging into the 90-100% target band
- [x] 1.4 Use fixed quality 90 for JPEG/WEBP, optimize-only for PNG (no quality iteration)
- [x] 1.5 Add minimum dimension guard at 64px (break search if either dimension would go below)
- [x] 1.6 Implement best-effort fallback: prefer best under-limit, then closest overall, then ValidationError

## 2. Test suite for image_size_utils

- [x] 2.1 Create `test/features/images/test_image_size_utils.py` with test class
- [x] 2.2 Test: under-limit file returns original path unchanged
- [x] 2.3 Test: large JPEG resized into 90-100% target band
- [x] 2.4 Test: large PNG resized into 90-100% target band
- [x] 2.5 Test: large WEBP resized into 90-100% target band
- [x] 2.6 Test: minimum dimension guard returns best effort without crashing
- [x] 2.7 Test: iteration safety cap returns best effort
- [x] 2.8 Test: `normalize_image_size_category` variants
- [x] 2.9 Test: `calculate_image_size_category` thresholds and error case

## 3. Verify

- [x] 3.1 Run existing tests to confirm no regressions
- [x] 3.2 Run pre-commit linting
63 changes: 63 additions & 0 deletions openspec/specs/binary-search-resize/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Requirements

### Requirement: Binary search convergence on scale factor
The system SHALL use binary search on scale factor to find an output size within the target band. The initial guess SHALL be computed as `sqrt(max_size_bytes / original_file_size)`, clamped to `[0.1, 1.0]`. The search space SHALL be `lo=0.1`, `hi=1.0`.

#### Scenario: Large JPEG resized into target band
- **WHEN** a 2000x2000 JPEG image exceeding the size limit is resized with a given max_size_bytes
- **THEN** the output file size SHALL be between 90% and 100% of max_size_bytes

#### Scenario: Large PNG resized into target band
- **WHEN** a 2000x2000 PNG image exceeding the size limit is resized with a given max_size_bytes
- **THEN** the output file size SHALL be between 90% and 100% of max_size_bytes

#### Scenario: Large WEBP resized into target band
- **WHEN** a 2000x2000 WEBP image exceeding the size limit is resized with a given max_size_bytes
- **THEN** the output file size SHALL be between 90% and 100% of max_size_bytes

### Requirement: Fixed compression quality for lossy formats
The system SHALL use a fixed quality of 90 for JPEG and WEBP encoding. PNG encoding SHALL NOT use a quality parameter (lossless).

#### Scenario: JPEG encoded at quality 90
- **WHEN** a JPEG image is resized
- **THEN** the output SHALL be encoded with quality=90 and optimize=True

#### Scenario: PNG encoded without quality parameter
- **WHEN** a PNG image is resized
- **THEN** the output SHALL be encoded with optimize=True and no quality parameter

### Requirement: Early return for under-limit files
The system SHALL return the original file path unchanged when the original file size is already within the size limit.

#### Scenario: Small file returns original path
- **WHEN** resize_file is called with an image whose file size is already under max_size_bytes
- **THEN** the original input_path SHALL be returned without re-encoding

### Requirement: Minimum dimension guard
The system SHALL stop searching and return best effort when either dimension of the scaled image would fall below 64 pixels.

#### Scenario: Image hits minimum dimension during search
- **WHEN** binary search reaches a scale factor where either width or height would be below 64px
- **THEN** the system SHALL stop the search and return the best under-limit result seen so far, or the closest result overall

### Requirement: Iteration safety cap
The system SHALL stop after a maximum of 10 iterations and return the best result available.

#### Scenario: Safety cap reached
- **WHEN** the binary search has not converged after 10 iterations
- **THEN** the system SHALL return the best under-limit result, or the closest result overall, or raise a ValidationError if no result was produced

### Requirement: Best-effort fallback
The system SHALL track the best under-limit result and the best overall result during the search. When the search ends without landing in the target band, the system SHALL prefer the best under-limit result, then the best overall result, and only raise ValidationError as a last resort.

#### Scenario: No result in target band but under-limit result exists
- **WHEN** the search ends and no result landed in the 90-100% band but a result under the limit was found
- **THEN** the system SHALL return the best under-limit result

#### Scenario: No under-limit result exists
- **WHEN** the search ends and no result was under the limit
- **THEN** the system SHALL return the smallest result seen overall

#### Scenario: No result produced at all
- **WHEN** the search ends with no results (e.g., image too small to encode)
- **THEN** the system SHALL raise a ValidationError with error code INVALID_IMAGE_SIZE
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "the-agent"
version = "5.8.0"
version = "5.8.1"

[tool.setuptools]
package-dir = {"" = "src"}
Expand Down
Loading
Loading