You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SOHAM Digital Life Companion
A comprehensive AI agent system that combines memory, vision, voice, automation, and proactive monitoring capabilities through a modular architecture.
## Overview
SOHAM (Digital Life Companion) is designed as a localized AI agent that:
- **Remembers Everything**: Persistent memory using vector and structured databases
- **Sees Your Screen**: Advanced vision capabilities with OCR and AI analysis
- **Speaks and Listens**: Natural voice interaction with wake word detection
- **Controls Your Computer**: Intelligent automation with vision-guided interactions
- **Monitors Proactively**: Background health and activity monitoring
- **Thinks Intelligently**: Multi-provider AI with automatic failover
## Architecture
The system consists of six core modules coordinated by a central Context Orchestrator:
- **Memory Module**: ChromaDB (vector) + SQLite (structured) storage
- **Brain Module**: Multi-provider AI management (Groq, Cerebras, Ollama)
- **Voice Module**: Faster-Whisper + Edge TTS for speech I/O
- **Vision Module**: OCR + Moondream vision model for screen analysis
- **Automation Module**: PyAutoGUI + JSON protocols for system control
- **Observer Module**: Proactive monitoring and notifications
## Quick Start
### Prerequisites
- Python 3.9+
- Git
- (Optional) Ollama for local AI models
### Installation
1. Clone the repository:
```bash
git clone https://github.com/soham-ai/digital-companion.git
cd digital-companion
```
2. Create virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Set up environment variables:
```bash
cp .env.example .env
# Edit .env with your API keys and preferences
```
5. Initialize the system:
```bash
python -m soham init
```
### Basic Usage
```python
import asyncio
from soham import ContextOrchestrator
async def main():
# Initialize SOHAM
orchestrator = ContextOrchestrator()
await orchestrator.initialize()
# Process user input
response = await orchestrator.process_user_input(
"Hello SOHAM, what's on my screen?"
)
print(response.data.get("response"))
await orchestrator.shutdown()
if __name__ == "__main__":
asyncio.run(main())
```
## Configuration
SOHAM uses YAML configuration files with environment variable support:
```yaml
# config/default.yaml
brain:
providers:
groq:
api_key: "${GROQ_API_KEY}"
models: ["llama-3.1-8b-instant"]
priority: 1
```
## Modules
### Memory Module
- **Vector Storage**: Semantic search using ChromaDB
- **Structured Storage**: Relational data using SQLite
- **Context Injection**: Intelligent context retrieval for queries
### Brain Module
- **Multi-Provider**: Groq, Cerebras, Ollama support
- **Intelligent Routing**: Automatic provider selection based on task complexity
- **Failover**: Seamless fallback between providers
### Voice Module
- **Speech-to-Text**: Faster-Whisper for accurate transcription
- **Text-to-Speech**: Edge TTS for natural voice synthesis
- **Wake Word**: Always-listening activation with "Hey SOHAM"
### Vision Module
- **Screen Capture**: Multi-monitor screenshot capabilities
- **OCR**: Text extraction using Tesseract
- **AI Vision**: UI understanding using Moondream model
### Automation Module
- **Protocol Engine**: JSON-based workflow execution
- **System Control**: Application and window management
- **Vision-Guided**: Dynamic coordinate detection for interactions
### Observer Module
- **Health Monitoring**: CPU, memory, battery, disk usage
- **Activity Detection**: User behavior and pattern analysis
- **Proactive Alerts**: Calendar reminders and system notifications
## Testing
SOHAM uses pytest with Hypothesis for comprehensive testing:
```bash
# Run all tests
pytest
# Run specific module tests
pytest tests/test_memory/ -v
# Run property-based tests only
pytest -m property
# Run with coverage
pytest --cov=soham --cov-report=html
```
### Property-Based Testing
Each correctness property is implemented as a property-based test:
```python
from hypothesis import given, strategies as st
@given(st.text(min_size=1))
async def test_property_conversation_storage_completeness(conversation_text):
"""**Feature: soham-core-memory, Property 1: Conversation Storage Completeness**"""
# Test implementation
pass
```
## Development
### Setting Up Development Environment
```bash
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run code formatting
black soham/
flake8 soham/
# Type checking
mypy soham/
```
### Adding New Modules
1. Create module directory: `soham/your_module/`
2. Implement interface: `class YourModule(BaseModule)`
3. Register with orchestrator: `orchestrator.register_module(your_module)`
4. Add tests: `tests/test_your_module/`
## API Reference
### Core Interfaces
- `BaseModule`: Abstract base for all modules
- `MemoryInterface`: Memory storage and retrieval
- `BrainInterface`: AI provider management
- `VoiceInterface`: Speech input/output
- `VisionInterface`: Screen analysis
- `AutomationInterface`: System control
- `ObserverInterface`: Monitoring and notifications
### Data Models
- `ConversationItem`: Chat history storage
- `VisionAnalysisResult`: Screen analysis results
- `ProtocolDefinition`: Automation workflow scripts
- `SystemState`: Overall system status
## Contributing
1. Fork the repository
2. Create feature branch: `git checkout -b feature/amazing-feature`
3. Make changes and add tests
4. Run test suite: `pytest`
5. Commit changes: `git commit -m 'Add amazing feature'`
6. Push to branch: `git push origin feature/amazing-feature`
7. Open Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- ChromaDB for vector database capabilities
- Faster-Whisper for speech recognition
- Edge TTS for speech synthesis
- Moondream for vision understanding
- PyAutoGUI for system automation
- Hypothesis for property-based testing
## Support
- Documentation: [docs.soham.ai](https://docs.soham.ai)
- Issues: [GitHub Issues](https://github.com/soham-ai/digital-companion/issues)
- Discussions: [GitHub Discussions](https://github.com/soham-ai/digital-companion/discussions)
- Email: support@soham.ai# SOHAM
About
A comprehensive AI agent system that combines memory, vision, voice, automation, and proactive monitoring capabilities through a modular architecture. SOHAM is Self-Organizing Hyper-Adaptive Machine