Skip to content

updating from Dev#1

Closed
Rryvern wants to merge 92 commits intoRryvern:mainfrom
Mirrowel:dev
Closed

updating from Dev#1
Rryvern wants to merge 92 commits intoRryvern:mainfrom
Mirrowel:dev

Conversation

@Rryvern
Copy link
Owner

@Rryvern Rryvern commented Mar 6, 2026

Description

Testing Done

Checklist

  • I have tested these changes locally
  • I have added license headers to new files (LGPL for library, MIT for proxy)
  • I have updated documentation (README/DOCUMENTATION.md) if needed
  • Related issue: #

This commit executes a major architectural refactor, decomposing the monolithic `client.py` and `usage_manager.py` files into modular, domain-specific packages to improve maintainability and separation of concerns.

- **Client Refactor**: `RotatingClient` is now a lightweight facade delegating to:
  - `RequestExecutor`: Unified retry and rotation logic.
  - `StreamingHandler`: Stream processing and error detection.
  - `CredentialFilter`: Tier and priority compatibility filtering.
  - `ModelResolver`: Model name resolution and whitelisting.
  - `ProviderTransforms`: Provider-specific request mutations.

- **Usage Manager Refactor**: `UsageManager` logic is now distributed across:
  - `TrackingEngine`: Usage recording and window management.
  - `LimitEngine`: Enforcement of cooldowns, caps, and fair cycle limits.
  - `SelectionEngine`: Credential selection strategies (balanced/sequential).
  - `CredentialRegistry`: Stable identity management for credentials.
  - `UsageStorage`: Resilient async JSON persistence.

- **Core Infrastructure**: Added `src/rotator_library/core` for shared types, error definitions, and centralized configuration loading.

BREAKING CHANGE: The files `src/rotator_library/client.py` and `src/rotator_library/usage_manager.py` have been deleted. `client.py` is replaced by the `client` package. `usage_manager.py` is replaced by the `usage` package. Direct imports from `rotator_library.usage_manager` must be updated to `rotator_library.usage` or top-level exports.
…mplementation

- Preserve the original monolithic implementation in `_client_legacy.py` and `_usage_manager_legacy.py`.
- Update `RequestExecutor` to support transaction logging, request sanitization, and consecutive quota failure detection.
- Implement `ConcurrentLimitChecker` in the usage limit engine to enforce `max_concurrent` constraints.
- Improve `StreamingHandler` with robust error buffering for fragmented JSON responses.
- Add fair cycle reset logic and quota baseline synchronization to the new `UsageManager`.
This overhaul introduces smart queuing for credential acquisition and shared quota tracking.

- Implement async waiting in `UsageManager`: calls to `acquire_credential` now block (up to a deadline) using asyncio primitives when keys are busy or on cooldown.
- Add Quota Group synchronization: request counts are now synced across models that share a specific quota pool (e.g., Antigravity variants).
- Add support for cached prompt token tracking in usage statistics.
- Refactor `RequestExecutor` to reuse a shared `httpx.AsyncClient` for better performance.
- Correct token counting for Antigravity models by including preprompt overhead.

BREAKING CHANGE: `UsageManager.acquire_credential` is now `async` and must be awaited. `RequestExecutor` now requires an `http_client` argument during initialization.
- **error_handler**: Implement logic to extract `quotaValue` and `quotaId` from Google/Gemini error responses for better rate limit observability.
- **streaming**: Remove legacy `UsageManager` support and the `__init__` method from `StreamingHandler`; usage recording is now delegated to `CredentialContext`.
- **client**: Handle `_parent_log_dir` internal parameter to configure transaction logger output directory.
This commit introduces a comprehensive overhaul of the usage tracking system to support more complex quota management scenarios and integration capabilities.

- **Granular Usage Scopes**: Usage windows can now be scoped to specific models, quota groups, or credentials via `WindowDefinition.applies_to`, enabling precise limit enforcement (e.g., shared model quotas vs. individual key limits).
- **Cost Calculation**: Integrated `litellm` cost calculation for both standard and streaming requests. `approx_cost` is now tracked and persisted in usage statistics.
- **Provider Hooks**: Added `HookDispatcher` and `on_request_complete` interface, allowing provider plugins to intercept request results, override usage counts, or trigger exhaustion based on custom logic.
- **Usage API**: Introduced `UsageAPI` facade (`src/rotator_library/usage/integration/api.py`) to provide a stable interface for external components to query state and manage cooldowns.
- **Fair Cycle Enhancements**: Refined fair cycle tracking with support for global state persistence and configurable tracking modes (credential-level vs. group-level).
- **Configuration**: Expanded environment variable support for custom caps, fair cycle settings, and sequential fallback multipliers.
- **Persistence**: Updated storage schema to handle nested `model_usage` and `group_usage` statistics.

Also in this commit:
- feat(client): add automatic request validation via provider plugins
- fix(streaming): correctly track cached vs. uncached tokens in usage stats
- refactor: add backward compatibility shim in `src/rotator_library/usage_manager.py
Refines the request executor and client to support deep observability and dynamic quota management within the modular architecture.

- **Observability**:
  - Implement a sanitized LiteLLM logging callback to safely pipe provider logs to the library logger.
  - Capture response headers in `UsageManager`, `CredentialContext`, and failure logs to aid debugging.
  - Pass request headers to the failure logger for better context.
- **Usage Management**:
  - Implement `_apply_usage_reset_config` to dynamically generate rolling window definitions (e.g., daily limits) based on provider settings.
  - Fix `fair_cycle_key` resolution logic in the tracking engine.
- **Client & Executor**:
  - Support passing provider-specific LiteLLM parameters via `RequestExecutor`.
  - Update `BackgroundRefresher` to support the new usage manager registry and prevent crashes when managers are missing.
…ization

This commit overhauls how usage statistics are aggregated and initialized to support shared quotas and improve stability.

- **Quota Groups**: Implemented `_backfill_group_usage` to derive shared group statistics (e.g., tiered limits) from individual model windows. Updated `CooldownChecker` to enforce limits at both model and group levels.
- **Initialization**: Added `initialize_usage_managers` with async locking to `RotatingClient`. Updated `main.py` and `background_refresher.py` to invoke this explicitly, ensuring state is loaded before traffic.
- **Persistence**: Switched storage mechanisms to `ResilientStateWriter` and `safe_read_json` to prevent data corruption during atomic writes.
- **Providers**: Refined quota calculations (calculating `quota_used` from fractions) for Chutes, Firmware, and NanoGPT.
- **Antigravity**: Updated system prompts to strictly enforce parallel tool calling behavior.
- **Logging**: Implemented batched logging for quota exhaustion events to reduce noise.
…t execution

This commit introduces comprehensive usage tracking and refactors the client execution flow for better observability and stability.

- Refactor `RotatingClient.acompletion` to be explicitly `async`, ensuring proper execution in `proxy_app`.
- Implement detailed usage summaries in `get_usage_stats`, including token caching percentages, approximate costs, and detailed provider states.
- Add granular logging in `RequestExecutor` to trace credential availability (displaying blocks by cooldowns, fair cycle, or caps) and current quota window saturation.
- Introduce debounced state saving in `UsageManager` to optimize storage I/O and add logic for backfilling model usage data.

BREAKING CHANGE: `RotatingClient.acompletion` is now an `async` function and must be awaited by the caller.
…window manager

- Update `RequestExecutor` to await `usage_manager.get_availability_stats`, ensuring non-blocking execution during availability checks.
- Expose `window_manager` directly as a property on `UsageManager`.
- Refactor `RequestExecutor` to access `window_manager` directly instead of traversing via `limits.windows`.
Expanded the usage tracking system to capture and persist detailed token metrics, specifically reasoning (thinking) tokens and cache creation tokens.

- Updated client executors to extract `reasoning_tokens` and `cache_creation_tokens` from provider responses.
- Extended `UsageStats` and `WindowStats` models to store granular token breakdowns and explicit success/failure counts.
- Adapted storage and aggregation logic to persist and calculate these new metrics across quota windows.
…tions

- Archive the existing `RotatingClient` and `UsageManager` logic into new `_legacy` modules (`_client_legacy.py` and `_usage_manager_legacy.py`). This preserves the stable baseline implementation to facilitate reference and comparison during the ongoing core architecture refactor.
- Add `src/rotator_library/providers/example_provider.py` as a comprehensive reference template. This file documents the standard patterns for implementing providers with advanced usage management, quota groups, custom token counting, and background refresh jobs.
…ntation

This introduces a thread-safe mechanism using `contextvars` to accurately track and report internal API retries within providers.

- Implement `ContextVar` retry counting in `AntigravityProvider` to capture hidden API costs from internal retries (e.g., on empty responses or malformed calls).
- Update `ExampleProvider` with comprehensive patterns for custom usage handling, including retry counting and error-based cooldowns.
- Expand documentation for `UsageAPI` and `HookDispatcher` with detailed usage examples and architectural context.
Updates the usage manager to prevent lower (stale) API usage values from overwriting higher (current) local counters during background synchronizations. API providers often return cached data or update in increments, causing local state regression.

- Implement `max(local, api)` logic for request counts to ensure monotonic growth
- Add `force` parameter to `UsageManager` and quota trackers to allow manual overrides
- Preserve accurate local tracking while allowing forced resets
This introduces a compatibility layer that allows the `RotatingClient` to accept Anthropic-format requests, translating them to the internal OpenAI format for processing and converting responses back.

- Implement `AnthropicHandler` with support for `messages` and `count_tokens`.
- Integrate handler into `RotatingClient` to enable direct Anthropic SDK usage.

Also in this commit:
- feat(usage): add `force_refresh_quota` and `reload_usage_from_disk` for manual state management
- refactor(usage): implement `reload_from_disk` to sync local state without external calls
…counts

- Extract `_aggregate_model_windows` to unify how usage stats are summed across quota groups and credentials, reducing code duplication.
- Implement `_reconcile_window_counts` to ensure success and failure counters remain consistent (mathematically valid) when total request counts are updated from external quota sources.
- Enable synchronization and backfilling of credential-level windows to reflect aggregated model usage.
- Simplify `update_usage` and `_backfill_group_usage` logic by leveraging the new shared aggregation helpers.
This commit extracts the initialization, credential filtering, and validation steps into a reusable `_prepare_execution` helper method within `RequestExecutor`. This ensures consistency and reduces code duplication between streaming and non-streaming request handlers.

Also in this commit:
- refactor(usage): remove unused `release` method from `TrackingEngine
Refactors the usage tracking system to remove the monolithic `UsageStats` structure in favor of distinct `ModelStats`, `GroupStats`, and `TotalStats` containers. This eliminates complex window aggregation and backfilling logic by recording usage directly to relevant scopes via a unified `UsageUpdate` mechanism.

- Replace `UsageStats` with `TotalStats`, `ModelStats`, and `GroupStats` in `CredentialState`
- Remove `_backfill_group_usage`, `_sync_quota_group_counts`, and window aggregation logic
- Centralize request recording in `TrackingEngine` using a new `UsageUpdate` dataclass
- Update storage persistence to serialize/deserialize the new schema
- Adapt limit checkers (`WindowLimitChecker`, `CustomCapChecker`) to access specific stats scopes
- Update client executor and API integration to reflect the new data model
- Add `window_limits_enabled` configuration (default `False`) to allow disabling local window quota blocking.
- Update `LimitEngine` to only include `WindowLimitChecker` when explicitly enabled in config.
- Restore legacy-style logging for quota exhaustion events, including reset time calculations.
- Re-implement fair cycle exhaustion logging to track credential usage status.
- Modify window exhaustion logic to only apply cooldowns when forced.
- Update `UsageManager` logic to apply quota updates to model windows only when no `group_key` is provided.
- Ensure usage is attributed to the correct scope (group vs. model), preventing model window updates when API-level group quotas are active where specific model attribution may be unknown or irrelevant.
…imestamps

- Standardize default configuration by replacing the specific `daily` window with a generic 24h `rolling` window.
- Remove the deprecated `total` window definition and add logic to ignore legacy "total" windows during storage parsing.
- Add `*_human` date string fields (e.g., `reset_at_human`, `last_updated_human`) alongside Unix timestamps in storage dumps to improve debugging and observability.
The quota display logic in `RequestExecutor` now implements a fallback strategy. It prioritizes shared group limits but falls back to model-specific limits if no group window is found or if the group window lacks a limit.

Changes to defaults:
- Remove `WindowDefinition.total` and the default infinite tracking window.
- Simplify default window configuration to a single primary daily window (removed 5h window).

Also in this commit:
- chore: update `.gitignore` to anchor paths and exclude `oauth_creds/` and `usage/` directories
The tracking engine now checks `self._config.window_limits_enabled` before evaluating window exhaustion for groups and models. This ensures that usage tracking does not mark resources as exhausted based on window limits when the feature is explicitly disabled.
Inject a shared `provider_instances` cache into core client sub-components to prevent duplicate plugin instantiation.

- Update `CredentialFilter`, `ModelResolver`, `ProviderTransforms`, and `RequestExecutor` to accept an optional `provider_instances` dict.
- Configure `RotatingClient` to pass its central instance registry to these components during initialization.
- Ensure plugin instances are reused across the client lifecycle, improving state consistency and reducing overhead.
This change introduces a `SingletonABCMeta` metaclass to ensure that all `ProviderInterface` implementations behave as singletons. This resolves state fragmentation issues where different components (UsageManager, Hooks, etc.) held separate instances with distinct caches.

- Implement `SingletonABCMeta` and apply to `ProviderInterface`.
- Remove redundant local instance caching in `HookDispatcher` as class instantiation now guarantees a singleton return.
- Add debug logging in `CredentialFilter` to trace instance creation vs. cache hits.
This commit overhauls how quota exhaustion is determined by delegating the decision to individual providers rather than relying on a generic check in the usage manager.

- Introduce `is_initial_fetch` flag to distinguish between startup synchronization and background refreshes.
- Update `AntigravityQuotaTracker` to only apply API-based exhaustion during initial fetch (due to coarse 20% API update increments), relying on local tracking for subsequent updates.
- Update `BaseQuotaTracker` to strictly validate `reset_timestamp`, ignoring it if the quota is full (1.0 remaining), and applying exhaustion immediately if remaining is 0.
- Refactor `UsageManager` to accept an explicit `apply_exhaustion` boolean from providers.
- Add idempotency check to `TrackingEngine` to prevent duplicate logging of exhaustion events.
…ion parsing

- Introduce `quota_threshold` in Fair Cycle configuration to control exhaustion based on window limit multipliers.
- Implement robust duration string parsing (e.g., "1d2h30m") and flexible cooldown modes (offsets, percentages) for Custom Caps.
- Update `FairCycleChecker` to utilize `WindowManager` for dynamic quota resolution.
- Allow Custom Caps to define limits independent of window clamps and improve cooldown calculation logic.
- Remove redundant explicit exhaustion marking in `TrackingEngine` in favor of centralized checker logic.
The custom cap logic has been overhauled to independently verify both model-level and group-level limits. Previously, the system resolved a single "most specific" cap, which could allow requests to bypass group limits if a model-specific cap was present (or vice versa).

- Implemented `_find_all_caps` to retrieve both priority/default model caps and group caps simultaneously.
- Updated `check` method to validate every applicable cap against its respective usage scope (model stats vs. group stats).
- Extracted validation logic into `_check_single_cap` for better readability and reuse.
- Added `get_all_caps_for` to expose all active limits for a given context.
This introduces a "high-water mark" tracking mechanism for usage windows. Even after a window resets (e.g., a daily limit reset), the system now retains the maximum number of requests ever recorded for that specific window definition.

- Add `max_recorded_requests` and `max_recorded_at` fields to `WindowStats`.
- Update `TrackingEngine` to capture and update peak usage in real-time.
- Ensure `WindowManager` carries forward historical max values when rotating or recreating expired windows.
- Update storage persistence and manager output to include these new metrics.
…and sorting logic

- Implement support for displaying multiple quota windows (e.g., daily, monthly) per group in dashboard and detail views.
- Refactor `UsageManager` to aggregate quota stats by window structure instead of flattening totals.
- Fix token usage calculations to correctly distinguish between cached and uncached input tokens.
- Add intelligent sorting strategies: providers ordered by attention needed (quota status), and quota groups by lowest remaining percentage.
- Add configuration toggle to show/hide model-level usage details per provider.
- Improve UI layout with adjustable column widths, better cooldown grouping, and natural sort order for credentials.
- Defer initialization of window timers (`started_at`, `reset_at`) until the first request is actually recorded.
- Prevent the display of misleading reset times for windows that have not yet been utilized.
- Update quota synchronization logic to only apply remote reset times and timestamps when usage is detected.
Mirrowel and others added 29 commits January 25, 2026 01:49
…lities

- Extract project ID extraction, credential loading, and environment variable generation into `gemini_shared_utils`.
- Remove duplicated logic from `AntigravityAuthBase` and `GeminiAuthBase`.
- Promote `_persist_project_metadata` to `GoogleOAuthBase` to reduce code duplication.
This change introduces the ability to capture, persist, and display descriptive tier names (e.g., "Google One AI PRO") alongside canonical tier identifiers used for internal logic.

- Add `get_tier_full_name` utility and mapping for various Google One and Code Assist subscription types.
- Update `GeminiAuthBase` and `AntigravityAuthBase` to cache and persist the full tier name in credential metadata.
- Enhance discovery and onboarding logs to display the specific subscription source instead of generic tier IDs.
- Ensure `tier_full` is preserved across sessions via `_persist_project_metadata`.
The `check_expired_windows` method has been removed from the `WindowManager` class as it is no longer utilized in the tracking logic.

Also in this commit:
- refactor(client): remove extraneous `cache_id` reference from `CredentialFilter` logging
Rich 14.0+ lazy-loads unicode data via dynamic imports, which PyInstaller fails to detect automatically. This ensures the required `rich._unicode_data` modules are included in the build artifact to prevent runtime errors.

Also in this commit:
- chore: bump rotator_library version from 1.5 to 1.7
…dential expiry

- Remove the complex re-authentication queue system (`_reauth_queue`) and associated background processors to prevent blocking operations.
- Implement a permanent expiry mechanism where credentials returning invalid grant or unauthorized errors (HTTP 400/401/403) are immediately removed from rotation.
- Disable attempts to trigger interactive OAuth flows during proxy operation; re-authentication now explicitly requires manual intervention via `credential_tool.py` and a service restart.
- Standardize "fail-fast" error handling and logging across Google, iFlow, and Qwen authentication providers.
Implement a thread-safe `_get_lock` method in auth providers to handle the retrieval and creation of refresh locks. This ensures that the `_refresh_locks` dictionary is modified under a master lock (`_locks_lock`), preventing Time-of-check to time-of-use (TOCTOU) bugs where multiple coroutines could simultaneously create duplicate locks for the same path.
Removes the interactive global re-authentication coordinator logic from Google and Qwen providers. Instead of attempting to launch an interactive OAuth flow (which blocks proxy operations), the system now immediately marks the credential as permanently expired and raises an error.

- Remove `get_reauth_coordinator` usage in `GoogleOAuthBase` and `QwenAuthBase`.
- Implement immediate credential expiration upon token refresh failure.
- Raise `ValueError` with specific instructions to run `credential_tool.py` for recovery.
- Simplify `IFlowAuthBase` expiration logic to strictly enforce manual re-authentication.
Update authentication providers to allow interactive OAuth when running manually (tool context), while preserving non-blocking expiry behavior during proxy operations.

Also in this commit:
- fix(providers): lazily start the refresh queue processor in iFlow and Qwen to ensure background updates run
Check credential metadata for `tier_full` to display a more descriptive tier name (e.g., "Google One AI PRO") instead of the raw tier code.
- Replace the Electron-based User-Agent with the native client format (`antigravity/1.15.8 windows/amd64`).
- Preserve the previous User-Agent as `ANTIGRAVITY_USER_AGENT_LEGACY` for reference.
This commit replaces the legacy device profile system with a robust `DeviceFingerprint` implementation to improve rate-limit mitigation and mimic authentic clients.

- Replace `DeviceProfile` with `DeviceFingerprint` to track `User-Agent`, `X-Goog-Api-Client`, `X-Goog-QuotaUser`, and `X-Client-Device-Id`.
- Implement realistic OS version simulation for Windows, macOS, and Linux, along with architecture and SDK client randomization.
- Add silent upgrade logic to migrate existing legacy profiles to full fingerprints while preserving hardware IDs.
- Update `AntigravityProvider` and quota trackers to use credential-specific headers for all requests.
Introduces a `_get_active_states` helper method to filter the internal states dictionary against active stable IDs. This ensures that priority calculations, credential selection, and availability statistics strictly use active credentials, excluding stale or historical data.
Filters out providers that report zero `total_requests` during stats aggregation. This ensures that unused or invalid providers are not included in the final report.
The retry loop was previously limited strictly to `EMPTY_RESPONSE_MAX_ATTEMPTS`. This logic was flawed as it could prematurely terminate the loop even if `CAPACITY_EXHAUSTED_MAX_ATTEMPTS` allowed for more retries.

- Calculate the maximum of all retry constants to determine loop range.
- Ensure the loop iterates sufficiently for whichever error type requires the most retries.
… for iFlow

This update introduces comprehensive support for reasoning models on the iFlow provider and hardens streaming stability.

- **Thinking Mode**: Adds support for `reasoning_effort` and maps it to provider-specific configurations (e.g., `enable_thinking` for DeepSeek/Qwen, `reasoning_split` for MiniMax, and `clear_thinking` logic for GLM).
- **Reasoning Preservation**: Implements a caching mechanism (`ProviderCache`) to store and reinject `reasoning_content` into message history, ensuring context continuity for multi-turn conversations with reasoning models.
- **Streaming Stability**:
  - Updates `StreamingHandler` to detect final chunks using multiple usage keys and source finish reasons.
  - Refactors `IFlowProvider` chunk conversion to robustly track `finish_reason`, prioritizing `tool_calls` and handling edge cases where usage objects are empty or partial.
- **Model Support**: Updates model definitions to include DeepSeek V3/R1, Qwen 2.5/3, and new GLM/MiniMax variants.
Adds the ability to authenticate with iFlow using browser session cookies (BXAuth) as a persistent alternative to OAuth tokens.

- Add interactive setup flow for cookie input in `credential_tool`.
- Implement automatic API key extraction, validation, and lifecycle management from cookies.
- Add proactive refreshing logic for cookie-based API keys (48h buffer).
- Refactor `IFlowAuthBase` to handle both OAuth and Cookie credential strategies transparently.
- Update credential list UI to display authentication type.
- Update User-Agent and library version headers to match `gemini-cli` v0.28.0 signatures.
- Align `google-auth-library` headers in OAuth flow with native CLI behavior.
Dedaluslabs API returns HTTP 422 when tool_choice is passed as a string
('auto') instead of an object. Since 'auto' is the default behavior anyway,
removing it from the request fixes the issue.

This was causing LiteLLM to silently return None after exhausting retries,
which then caused AttributeError: 'NoneType' object has no attribute '__aiter__'
when attempting to iterate the stream.
…e-422

fix(dedaluslabs): Remove tool_choice=auto to avoid 422 error
- Register `claude-opus-4.6` and its aliases in `AntigravityProvider`.
- Enforce mandatory `-thinking` variant usage for Opus 4.6, mirroring Opus 4.5 behavior.
- Add default quota and rate limit configurations for the new model in `AntigravityQuotaTracker`.
This prevents 'NoneType' object has no attribute 'get' errors for new accounts.
fix(iflow): add signed headers for chat requests
- Add 'coder-model' model ID which maps to Qwen 3.5 Plus
- This model is an efficient hybrid model with leading coding performance
- Discovered from CLIProxyAPI codebase model definitions
feat(qwen): add coder-model (Qwen 3.5 Plus) to hardcoded models
@Rryvern Rryvern closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants