Skip to content

Add Session API Design#1

Open
markturansky wants to merge 1 commit intomainfrom
feat/session-api-design
Open

Add Session API Design#1
markturansky wants to merge 1 commit intomainfrom
feat/session-api-design

Conversation

@markturansky
Copy link
Copy Markdown
Owner

Summary

Proposes a unified Session API that abstracts over different session providers (tmux, Paude, Ambient) while maintaining existing Agent Boss coordination capabilities.

Key Features

  • 🎯 SessionProvider Interface: Standardized create/start/stop/delete/message operations
  • 📋 Unified Session Model: Provider-agnostic status, configuration, and metadata
  • 🔗 REST API: Complete HTTP endpoints that integrate with existing architecture
  • 🔄 Backward Compatibility: Existing agent coordination unchanged
  • ⚙️ Multi-Provider Support: tmux (local), Paude (containers), Ambient (cloud)
  • 📝 Declarative Configuration: Space-specific overrides and agent templates

Implementation Plan

Phase 1: Core abstraction + TmuxProvider
Phase 2: PaudeProvider integration
Phase 3: Enhanced management + monitoring
Phase 4: AmbientProvider + cloud sessions

Benefits

  • Unified Management: Same API across local/container/cloud
  • Provider Choice: Optimal session type per use case
  • Clean Architecture: Session management separated from coordination
  • Future-Proof: Easy to add new providers (K8s, Lambda, etc.)

This design enables Agent Boss to scale from local development to production cloud deployment while keeping the coordination model that makes it effective.

🤖 Generated with Claude Code

Proposes unified session abstraction over tmux, Paude, and Ambient providers.

Key features:
- SessionProvider interface for create/start/stop/delete/message operations
- REST API endpoints for session management
- Provider-agnostic Session model with unified status/config
- Backward compatibility with existing agent coordination
- Configuration system with space-specific overrides
- 4-phase implementation plan

Enables consistent session management across local, container, and cloud
environments while maintaining Agent Boss coordination capabilities.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@markturansky
Copy link
Copy Markdown
Owner Author

Thanks for this PR! 🤖 Comment left by github-agent.

@markturansky
Copy link
Copy Markdown
Owner Author

Another comment from github-agent! 🚀

@markturansky
Copy link
Copy Markdown
Owner Author

Synchronous comment posted! Testing notification flow. 🔔

Copy link
Copy Markdown
Owner Author

@markturansky markturansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this thoughtful and comprehensive design document! The Session API proposal is well-structured and addresses a clear need for provider abstraction. I've reviewed the design thoroughly and have some feedback that I hope will be helpful.

Strengths

Excellent architecture decisions:

  • The provider abstraction is clean and well-designed, with clear separation of concerns
  • Backward compatibility strategy is well thought out
  • The phased implementation plan is realistic and incremental
  • Great use of examples and code snippets throughout

Comprehensive coverage:

  • All three provider types (tmux, Paude, Ambient) are well documented
  • REST API design follows good conventions
  • Security considerations are addressed

Design Feedback

1. Provider Interface & Error Handling

The SessionProvider interface looks solid, but consider adding explicit error types for better handling:

type SessionError struct {
    Code    ErrorCode
    Message string
    Cause   error
}

type ErrorCode string
const (
    ErrSessionNotFound    ErrorCode = "session_not_found"
    ErrSessionAlreadyExists ErrorCode = "session_already_exists"
    ErrProviderUnavailable ErrorCode = "provider_unavailable"
    ErrInvalidConfiguration ErrorCode = "invalid_configuration"
)

This would help API clients handle different failure modes appropriately and make provider implementations more consistent.

2. Session Lifecycle State Transitions

The SessionStatus enum is good, but the document doesn't specify valid state transitions. Consider documenting:

  • Can a stopped session be restarted, or must it be deleted and recreated?
  • What happens if a delete is called on a running session—does it auto-stop first?
  • How do you handle a session that crashes (transitions from running to error)?

A state transition diagram would be valuable here.

3. API Design Considerations

Endpoint consistency: You have both /api/v1/sessions/{id} and /spaces/{space}/sessions/{name} endpoints. Consider:

  • Should the space-scoped endpoints be under /api/v1/spaces/{space}/sessions for consistency?
  • The dual addressing (by id vs by space/name) is useful but could be confusing. Maybe clarify when to use which.

Missing endpoints:

  • Batch operations (start/stop multiple sessions)
  • Session template CRUD endpoints (based on your config examples)
  • Provider health/capability discovery (GET /api/v1/providers)

4. Messaging Model

The SendMessage operation is interesting but underdeveloped:

  • What's the message format? Just plain strings, or structured data?
  • How do responses work? Is messaging one-way or request/response?
  • Should there be a message queue or pub/sub model for reliability?
  • How do you handle message delivery failures across different providers?

Consider whether the coordination layer (existing broadcast functionality) should be separate from the provider-specific messaging.

5. Resource Limits & Validation

The ResourceLimits are provider-specific but presented as universal. Consider:

  • How does tmux handle CPU/memory limits? (It can't natively)
  • Should there be provider capability negotiation?
  • Validation: what happens if you request 100 CPUs on a machine with 4?

Maybe add a ValidateConfig(config SessionConfig) error method to the provider interface.

6. Configuration Complexity

The configuration model is very flexible but potentially complex:

  • Multiple layers of defaults (provider → space → agent)
  • How are conflicts resolved? (e.g., space says tmux, agent template says paude)
  • Consider adding examples of the final resolved configuration

The configuration merging/precedence rules should be explicitly documented.

7. Testing Strategy

The document doesn't address testing. Consider adding a section on:

  • Mock provider implementation for testing coordination logic
  • Integration tests for each provider
  • Contract tests to ensure all providers implement the interface correctly
  • How to test provider-specific features vs common features

8. Observability & Monitoring

Great start with GetLogs and IsHealthy, but consider:

  • Metrics: session count by status/provider, session duration, failure rates
  • Tracing: distributed tracing across session creation → message delivery
  • Events: session lifecycle events for audit logs
  • Should there be a webhook/notification system for session status changes?

9. Provider Implementation Details

Tmux:

  • How do you prevent session name collisions?
  • What happens if a tmux session exists but Agent Boss doesn't know about it?
  • Session persistence across Agent Boss restarts—how is state recovered?

Paude:

  • Container cleanup on failures?
  • How do you handle container image pulls (slow/failing)?
  • Volume mount conflicts?

Ambient:

  • API rate limiting and retry logic?
  • Credential rotation?
  • How to handle API version changes?

10. Migration & Rollback

The migration strategy is mentioned but light on details:

  • How do you migrate existing tmux sessions to the new API without disruption?
  • What's the rollback plan if Phase 2 or 3 has issues?
  • Can old and new systems run simultaneously during migration?

Minor Issues

  1. Line 52: SessionConfig.Image - should this be ImageURI to be more specific?
  2. Line 96: SupportedFeatures() []string - consider returning a structured feature set rather than strings
  3. Line 221: The tmux Start comment says "Session starts on creation" but the interface has separate Create and Start methods—clarify this impedance mismatch
  4. Line 386: Agent templates shown but not explained how they're stored/loaded

Questions

  1. How do you handle provider-specific features that don't fit the common interface (e.g., tmux window splitting, Ambient's advanced networking)?
  2. Is there a plan for session affinity or scheduling (e.g., "run this agent on the same host as that one")?
  3. How does this interact with the existing agent registration system? Do agents need to be aware of sessions?
  4. What's the expected session lifetime? Minutes? Hours? Days? This affects cleanup strategies.

Recommendations

  1. Add a testing section to the implementation plan
  2. Document state transitions and lifecycle more explicitly
  3. Consider adding a provider registry/discovery mechanism
  4. Add error handling and retry strategy documentation
  5. Include failure mode analysis (what happens when a provider becomes unavailable?)
  6. Add metrics/observability to the design

Summary

This is a solid foundation for a session abstraction layer. The design is well-structured and the problem is clearly understood. The main areas for enhancement are:

  • More detailed error handling and state management
  • Explicit testing strategy
  • Provider capability negotiation
  • Configuration precedence rules
  • Observability and operational concerns

I'm excited to see this implemented! Please let me know if you'd like me to elaborate on any of these points or if you have questions about the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant