Skip to content

[aw][code health] SessionSummary is a Pydantic BaseModel used as internal intermediate state — should be a frozen dataclass [Content truncated due to length] #1067

@microsasa

Description

@microsasa

Root Cause

SessionSummary in src/copilot_usage/models.py is a Pydantic BaseModel used as internal intermediate state rather than as an external-data validator. The coding guideline is explicit:

Pydantic at the Boundary, Plain Python Internally — Internal intermediate state uses frozen dataclasses with slots=True. External data (JSON files, API responses) is validated with Pydantic models.

SessionSummary is NOT parsed directly from JSON. Per architecture.md:

Populated by a parser that walks the events.jsonl file; not parsed directly from JSON.

It is constructed incrementally by build_session_summary in parser.py from already-validated SessionEvent objects. The only reason it uses Pydantic is for one @model_validator(mode="after") that checks two field invariants at construction time. This invariant check can be replicated with __post_init__ in a frozen dataclass at zero cost.

The consequence is that _CachedSession.summary: SessionSummary (in _SESSION_CACHE) holds an unfrozen object even though _CachedSession is itself a frozen dataclass. Any code path that obtains a SessionSummary from the cache and mutates its fields directly (instead of using model_copy(update=...)) silently corrupts the cache without raising an error. The current codebase is disciplined, but the type system provides no guard.

Scope

File Change
src/copilot_usage/models.py Convert SessionSummary from BaseModel to @dataclasses.dataclass(frozen=True, slots=True); move @model_validator logic to __post_init__; update total_output_tokens, shutdown_output_tokens, has_active_period_stats, session_sort_key, and merge_session_model_metrics (which already work with the plain attributes — no logic change needed)
src/copilot_usage/parser.py Replace SessionSummary(...) Pydantic constructors with dataclass constructors (same syntax); replace summary.model_copy(update={"name": fresh_name}) with dataclasses.replace(summary, name=fresh_name) (appears ~3 times)
src/copilot_usage/vscode_parser.py Same model_copydataclasses.replace replacements if applicable
tests/ Replace any session.model_dump() calls in tests with dataclasses.asdict(session) or direct field access; replace SessionSummary.model_fields introspection with dataclasses.fields

Spec

  1. Convert SessionSummary to a frozen dataclass:
`@dataclasses`.dataclass(frozen=True, slots=True)
class SessionSummary:
    session_id: str
    start_time: datetime | None = None
    # ... all fields as dataclass fields with defaults ...
    model_metrics: dict[str, ModelMetrics] = dataclasses.field(default_factory=dict)

    def __post_init__(self) -> None:
        if self.active_model_calls > self.model_calls:
            raise ValueError(
                f"active_model_calls ({self.active_model_calls}) must be <= "
                f"model_calls ({self.model_calls})"
            )
        if self.active_user_messages > self.user_messages:
            raise ValueError(
                f"active_user_messages ({self.active_user_messages}) must be <= "
                f"user_messages ({self.user_messages})"
            )
  1. Replace all summary.model_copy(update={...}) with dataclasses.replace(summary, ...).

  2. Note: model_metrics: dict[str, ModelMetrics] is still a mutable dict field. To make SessionSummary truly deep-immutable this field should use a mapping type (e.g. types.MappingProxyType), but converting the dict usage is a separate concern. At minimum, making the dataclass frozen=True prevents accidental re-assignment of the field itself.

Testing Requirement

  • Add a test in test_models.py that verifies SessionSummary raises on construction when active_model_calls > model_calls — equivalent to the existing @model_validator coverage.
  • Verify that attempting to assign a field on a constructed SessionSummary raises FrozenInstanceError (dataclasses equivalent of Pydantic's frozen model).
  • Run the full test suite to confirm no regressions. Pay particular attention to tests that call .model_dump(), .model_fields, or .model_copy() on SessionSummary instances — these must be updated.

Generated by Code Health Analysis · ● 4.8M ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowaw-dispatchedIssue has been dispatched to implementercode-healthCode cleanup and maintenance

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions