diff --git a/Changelog.md b/Changelog.md new file mode 100644 index 0000000..77ced94 --- /dev/null +++ b/Changelog.md @@ -0,0 +1,65 @@ +# Changelog + +All notable changes to speak will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). + +## [Unreleased] + +### Added +- Hardware detection via /proc/meminfo (RAM, CPU cores, AVX2) +- Disk-based KV cache (ds4-style) with SHA1 token keys +- Persistent user memory across sessions via user.md +- Agent loop for multi-step tool calls (up to 10 iterations) +- Tool system with read_file, search_web, remember, finish +- Web search via DuckDuckGo (30s timeout, 10 results max) +- Model registry with 13+ pre-configured models +- Interactive model selection on first run +- Auto-setup mode for headless installation +- Resumable model downloads with progress bar +- Multi-threaded downloads via aria2c (falls back to HTTP) +- Streaming chat interface with Readline support +- System prompt customization via embedded system_prompt.txt +- Hardware-aware config.json with detected and active sections +- Memory mapping (mmap) for low-RAM systems (<8GB) +- LRU cache cleanup (max 50 cache files) +- JSON serialization for all config structures +- Crystal spec tests for core functionality +- GitHub Actions CI workflow + +### Changed +- N/A (initial development) + +### Fixed +- N/A (initial development) + +### Removed +- N/A (initial development) + +### Security +- Path traversal protection in read_file tool +- File size limit (13MB) for read operations +- Working directory restriction for file access + +## [0.12.0-beta] - 2026-05-27 + +### Added +- First public beta release +- Nanbeige 4.1 3B model support (Q2_K, Q4_K_M, Q6_K) +- Basic chat functionality +- Command history with Readline +- Save and load conversation history +- Memory commands (memory, clearmemory) +- Clear screen command +- Help text in interface + +### Known Issues +- macOS support is experimental +- Windows not yet supported +- Web search may be slow on first query +- Large files (>13MB) cannot be read +- Model download requires stable internet connection + +### Notes +This is a beta release. Expect bugs and breaking changes. +Please report issues on GitHub. diff --git a/README.md b/README.md index a157fe9..a668290 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,8 @@ [![Crystal](https://img.shields.io/badge/Crystal-1.12-000000?logo=crystal)](https://crystal-lang.org/) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) +[![Security Policy](https://img.shields.io/badge/Security-Policy-blue)](Security.md) +[![Changelog](https://img.shields.io/badge/Change-Log-white)](Changelog.md) [![CI](https://github.com/zendrx/speak/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/zendrx/speak/actions/workflows/ci.yml) [![Lines of Code](https://img.shields.io/badge/Lines-1689-blue)](https://github.com/zendrx/speak) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md) diff --git a/Security.md b/Security.md new file mode 100644 index 0000000..968ce38 --- /dev/null +++ b/Security.md @@ -0,0 +1,128 @@ + +# Security Policy + +## Supported Versions + +Only the latest stable version of speak receives security updates. + +| Version | Supported | +|---------|-----------| +| latest | ✅ | +| < latest | ❌ | + +## Reporting a Vulnerability + +If you discover a security vulnerability in speak, please report it privately. + +**Do NOT report security issues through public GitHub issues.** + +### How to Report + +1. Email the maintainer directly at [ynwghosted@icloud.com] +2. Include detailed steps to reproduce the issue +3. Include your system information (OS, Crystal version) +4. Allow up to 48 hours for initial response + +### What to Expect + +- You will receive acknowledgment of your report within 48 hours +- The maintainer will investigate and confirm the vulnerability +- A fix will be developed and tested +- A security advisory will be published after the fix is released + +## Security Measures in speak + +### File System Protection + +- Path traversal attacks are blocked (files with `..` are rejected) +- File reading is restricted to the current working directory +- Maximum file size is limited to 13MB +- Directory reading is not allowed + +### Memory Safety + +- speak is written in Crystal, a memory-safe language +- No unsafe pointers or manual memory management +- Bounds checking is performed on all array accesses + +### Network Security + +- Web search uses DuckDuckGo (no API key required) +- No telemetry or data collection +- All network requests are HTTPS only +- Model downloads verify file size integrity + +### Input Validation + +- All user input is sanitized before processing +- Tool call arguments are validated before execution +- JSON parsing includes error handling + +## Known Limitations + +| Area | Limitation | Mitigation | +|------|------------|------------| +| Model files | Downloaded from Hugging Face over HTTPS | Verify file size checksum | +| Web search | DuckDuckGo HTML scraping | No API key, only used when user requests | +| Dependencies | llama.cpp is C++ code | Upstream library, security updates tracked | + +## Responsible Disclosure + +We follow responsible disclosure practices: + +1. Vulnerability is reported privately +2. Maintainer confirms and fixes the issue +3. Fix is tested and released +4. Security advisory is published +5. Public announcement after fix is available + +## Cryptographic Measures + +speak does not implement any cryptographic functions directly. It relies on Crystal's standard library for SHA1 hashing (used for KV cache keys). The SHA1 algorithm is used only for cache key generation, not for security-critical purposes. + +## Third-Party Dependencies + +| Dependency | Purpose | Security Notes | +|------------|---------|----------------| +| llama.cr | Crystal bindings to llama.cpp | Upstream library, monitor for updates | +| llama.cpp | Inference engine | C++ library, monitor for CVEs | +| readline | Command line input | Standard system library | + +## Reporting Format + +When reporting a vulnerability, please include: + +```yaml +# Example report format +version: "0.12.0-beta" +os: "Ubuntu 24.04" +crystal: "1.12.0" + +description: | + Detailed description of the issue + +steps_to_reproduce: | + 1. Run ./speak + 2. Type specific command + 3. Observe unexpected behavior + +impact: | + What an attacker could potentially do + +proposed_fix: | + Optional: suggested solution +``` + +Security Contact + +- Email: [ynwghosted@icloud.com] +- GitHub: @zendrx +- Response time: 24-48 hours + +Acknowledgments + +We thank the following people for reporting security issues: + +· List will be updated with contributors who report vulnerabilities + + diff --git a/src/speak.cr b/src/speak.cr index c5aae79..736a112 100644 --- a/src/speak.cr +++ b/src/speak.cr @@ -1,51 +1,115 @@ +# speak.cr - Main entry point for speak +# Integrates hardware detection, model selection, configuration, and chat + require "llama" require "./speak/system" require "./speak/config" +require "./speak/model" require "./speak/install" +require "./speak/disk" +require "./speak/tool" +require "./speak/memory" require "./speak/launch" +CONFIG_PATH = "./speak/config.json" + def main - config = Speak::Config.load_or_create - settings = config.apply_overrides + # Parse command line arguments + auto_setup = ARGV.includes?("--auto-setup") + force_setup = ARGV.includes?("--setup") + use_case = ARGV.includes?("--coding") ? "coding" : "general" - # Get available RAM to decide mmap - available_ram = Speak::System.available_ram_mb + # Check if config exists and we're not forcing setup + if !File.exists?(CONFIG_PATH) || force_setup + puts "speak - First time setup" + puts "=" * 40 + + manager = Speak::ModelManager.new(use_case) + + if auto_setup || force_setup + success = manager.auto_setup + else + success = manager.setup + end + + unless success + puts "Setup failed. Exiting." + exit 1 + end + + puts "\nSetup complete. Run ./speak again to start chatting." + exit 0 + end - # Use mmap only if RAM is tight (< 8GB) - use_mmap = available_ram < 8000 + # Load existing configuration + config = Speak::Config.load? + unless config + puts "Error: Config file exists but cannot be loaded." + puts "Please run: ./speak --setup" + exit 1 + end - model_path = "./speak/models/#{settings.model_file}" + settings = config.active + detected = config.detected + + puts "speak - Local AI Assistant" + puts "=" * 40 + puts "Hardware: #{detected.total_ram_mb} MB RAM, #{settings.cpu_cores} cores" + puts "Model: #{settings.model_file}" + puts "Context: #{settings.context_size} tokens" + puts "=" * 40 - model = if File.exists?(model_path) - puts "Loading model: #{model_path}" - puts "Available RAM: #{available_ram} MB" - puts "mmap: #{use_mmap} #{use_mmap ? "(RAM saving mode)" : "(Full RAM mode)"}" - Llama::Model.new(model_path, use_mmap: use_mmap) - else - puts "Model file not found: #{model_path}, installing..." - installer = Speak::Install.new - installer.install_model(settings.model_quant) - - if File.exists?(model_path) - puts "Model installed successfully: #{model_path}" - Llama::Model.new(model_path, use_mmap: use_mmap) - else - puts "Failed to install model: #{model_path}" - exit(1) - end - end - - begin - context = Llama::Context.new( - model: model, - n_ctx: settings.context_size.to_u32 - ) - launcher = Speak::Launch.new(context, model, settings) - launcher.run - rescue ex : Exception - puts "Error: #{ex.message}" - exit(1) + # Check if model file exists + model_path = "./speak/models/#{settings.model_file}" + + unless File.exists?(model_path) + puts "Model file not found: #{model_path}" + puts "Downloading model..." + + installer = Speak::Install.new + success = installer.install_model(settings.model_quant) + + unless success && File.exists?(model_path) + puts "Failed to download model. Please check your internet connection." + puts "You can also download the model manually and place it in: #{model_path}" + exit 1 + end + + puts "Model downloaded successfully." end + + # Determine if we should use mmap based on available RAM + use_mmap = settings.use_mmap && detected.available_ram_mb < 8000 + + # Load the model + puts "Loading model..." + model = Llama::Model.new(model_path, use_mmap: use_mmap) + + # Create context + context = Llama::Context.new( + model: model, + n_ctx: settings.context_size + ) + + # Launch chat interface + puts "Starting chat interface..." + puts "Type 'exit' to quit, 'help' for commands" + puts "-" * 40 + + launcher = Speak::Launch.new(context, model, settings) + launcher.run +end + +# Handle interrupt signals gracefully +Signal::INT.trap do + puts "\n\nInterrupted. Goodbye." + exit 0 +end + +Signal::TERM.trap do + puts "\n\nTerminated. Goodbye." + exit 0 end +# Run main function main diff --git a/src/speak/config.cr b/src/speak/config.cr index 7642518..3d7ec39 100644 --- a/src/speak/config.cr +++ b/src/speak/config.cr @@ -1,130 +1,62 @@ require "json" -require "./system" module Speak + # Represents the detected hardware (read from config.json) struct DetectedRam include JSON::Serializable property total_ram_mb : UInt64 property available_ram_mb : UInt64 property os_reserved_ram_mb : UInt64 - - def initialize(@total_ram_mb, @available_ram_mb, @os_reserved_ram_mb) - end end + # Represents the active settings (read from config.json) struct ActiveSettings include JSON::Serializable property cpu_cores : Int32 + property has_avx2 : Bool property free_disk_space_mb : UInt64 property context_size : Int32 property kv_cache_type : String property model_quant : String property model_file : String - property temperature : Float32 + property temperature : Float64 property max_tokens : Int32 - - def initialize(@cpu_cores, @free_disk_space_mb, @context_size, @kv_cache_type, @model_quant, @model_file, @temperature, @max_tokens) - end - end - - struct UserOverrides - include JSON::Serializable - - property os_reserved_ram_mb : UInt64? - property context_size : Int32? - property kv_cache_type : String? - property model_quant : String? - property max_tokens : Int32? - property temperature : Float32? - property model_file : String? - - def initialize(@os_reserved_ram_mb, @context_size, @kv_cache_type, @model_quant, @max_tokens, @temperature) - end + property use_mmap : Bool end + # Configuration loader class Config include JSON::Serializable property detected : DetectedRam property active : ActiveSettings - property user_overrides : UserOverrides - def initialize(@detected, @active, @user_overrides) + def initialize(@detected, @active) end - def self.load_or_create(path : String = "./speak/config.json") : Config - if File.exists?(path) - json = File.read(path) - config = Config.from_json(json) - config.refresh_detected - config.save(path) - config - else - config = Config.detect_and_create - dir = File.dirname(path) - Dir.mkdir_p(dir) unless Dir.exists?(dir) - config.save(path) - config + # Load config from JSON file + def self.load(path : String = "./speak/config.json") : Config + unless File.exists?(path) + raise "Config file not found: #{path}" end - end - - def self.detect_and_create : Config - detected = DetectedRam.new( - System.total_ram_mb, - System.available_ram_mb, - System.os_reserved_ram_mb - ) - - active = ActiveSettings.new( - System.cpu_cores, - System.free_disk_space_mb("/"), - System.recommended_context_size, - System.kv_cache_type, - System.recommended_quant, - System.model_file, - 0.7, - 512 - ) - - user_overrides = UserOverrides.new( - os_reserved_ram_mb = nil, - context_size = nil, - kv_cache_type = nil, - model_quant = nil, - max_tokens = nil, - temperature = nil - ) - - Config.new(detected, active, user_overrides) - end - - def refresh_detected - @detected.total_ram_mb = System.total_ram_mb - @detected.available_ram_mb = System.available_ram_mb - @detected.os_reserved_ram_mb = System.os_reserved_ram_mb - end - def apply_overrides : ActiveSettings - result = ActiveSettings.new( - @active.cpu_cores, - @active.free_disk_space_mb, - @user_overrides.context_size || @active.context_size, - @user_overrides.kv_cache_type || @active.kv_cache_type, - @user_overrides.model_quant || @active.model_quant, - @user_overrides.model_file || @active.model_file, - @user_overrides.temperature || @active.temperature, - @user_overrides.max_tokens || @active.max_tokens - ) - result + json = File.read(path) + Config.from_json(json) end - def save(path : String = "./speak/config.json") - dir = File.dirname(path) - Dir.mkdir_p(dir) unless Dir.exists?(dir) - json_string = self.to_pretty_json - File.write(path, json_string) + # Load config or return nil if not found + def self.load?(path : String = "./speak/config.json") : Config? + return nil unless File.exists?(path) + + begin + json = File.read(path) + Config.from_json(json) + rescue ex + puts "Error loading config: #{ex.message}" + nil + end end end end diff --git a/src/speak/model.cr b/src/speak/model.cr index de22a53..cd618d5 100644 --- a/src/speak/model.cr +++ b/src/speak/model.cr @@ -1,8 +1,8 @@ -# model.cr - Model registry and selection for speak -# Handles displaying available models, user selection, and saving to config +# model.cr - Model registry, hardware detection, and configuration for speak require "json" require "file_utils" +require "./system" module Speak # Represents a single model in the registry @@ -33,8 +33,6 @@ module Speak # Recommendation for a user struct ModelRecommendation - include JSON::Serializable - property model : ModelInfo property fit_score : Float64 property speed_score : Float64 @@ -46,17 +44,24 @@ module Speak end class ModelManager - # Models are embedded at compile time using read_file macro MODELS_JSON = {{ read_file("#{__DIR__}/models.json").chomp.stringify }} + CONFIG_PATH = "./speak/config.json" @registry : Array(ModelInfo) @available_ram_gb : Float64 @total_ram_gb : Float64 + @cpu_cores : Int32 + @has_avx2 : Bool + @free_disk_gb : Float64 @use_case : String - def initialize(available_ram_mb : UInt64, total_ram_mb : UInt64, use_case : String = "general") - @available_ram_gb = available_ram_mb.to_f / 1024.0 - @total_ram_gb = total_ram_mb.to_f / 1024.0 + def initialize(use_case : String = "general") + # Detect hardware using system.cr + @total_ram_gb = System.total_ram_mb.to_f / 1024.0 + @available_ram_gb = System.available_ram_mb.to_f / 1024.0 + @cpu_cores = System.cpu_cores + @has_avx2 = System.cpu_has_avx2 + @free_disk_gb = System.free_disk_space_mb("/").to_f / 1024.0 @use_case = use_case @registry = load_registry end @@ -65,11 +70,50 @@ module Speak private def load_registry : Array(ModelInfo) Array(ModelInfo).from_json(MODELS_JSON) rescue ex - puts "Error loading embedded model registry: #{ex.message}" - puts "Falling back to empty registry" + puts "Error loading model registry: #{ex.message}" [] of ModelInfo end + # Create config directory if it doesn't exist + private def ensure_config_dir + dir = File.dirname(CONFIG_PATH) + Dir.mkdir_p(dir) unless Dir.exists?(dir) + end + + # Write detected hardware to config.json + def write_detected_to_config + ensure_config_dir + + # Create detected section + detected = { + "total_ram_mb" => System.total_ram_mb, + "available_ram_mb" => System.available_ram_mb, + "os_reserved_ram_mb" => System.os_reserved_ram_mb + } + + # Create empty active section (to be filled later) + active = { + "cpu_cores" => @cpu_cores, + "has_avx2" => @has_avx2, + "free_disk_space_mb" => System.free_disk_space_mb("/"), + "context_size" => 2048, + "kv_cache_type" => "standard", + "model_quant" => "", + "model_file" => "", + "temperature" => 0.7, + "max_tokens" => 512, + "use_mmap" => true + } + + config = { + "detected" => detected, + "active" => active + } + + File.write(CONFIG_PATH, config.to_pretty_json) + puts "Hardware detection written to #{CONFIG_PATH}" + end + # Get models that fit in available RAM def get_compatible_models : Array(ModelInfo) @registry.select { |m| m.min_ram_gb <= @available_ram_gb } @@ -135,21 +179,37 @@ module Speak recommendations.first(limit) end + # Auto-select the best model based on hardware + def auto_select : ModelInfo? + recommendations = get_recommendations(1) + if recommendations.size > 0 + return recommendations.first.model + end + nil + end + # Display models to user and get selection def interactive_selection : ModelInfo? recommendations = get_recommendations(8) if recommendations.empty? - puts "No models found that fit your system (available RAM: #{@available_ram_gb.round(1)} GB)" - puts "You can still try downloading a model manually to ./speak/models/" + puts "No models found that fit your system" + puts "Available RAM: #{@available_ram_gb.round(1)} GB" + puts "Total RAM: #{@total_ram_gb.round(1)} GB" + puts "Free disk space: #{@free_disk_gb.round(1)} GB" + puts "CPU cores: #{@cpu_cores}" + puts "AVX2 support: #{@has_avx2}" return nil end puts "\n" + "=" * 70 - puts "Model Selection for speak" + puts "speak - Model Selection" puts "=" * 70 - puts "Your system: #{@total_ram_gb.round(1)} GB total, #{@available_ram_gb.round(1)} GB available" - puts "Use case: #{@use_case}" + puts "Your system:" + puts " RAM: #{@total_ram_gb.round(1)} GB total, #{@available_ram_gb.round(1)} GB available" + puts " Disk: #{@free_disk_gb.round(1)} GB free" + puts " CPU: #{@cpu_cores} cores, AVX2: #{@has_avx2 ? "Yes" : "No"}" + puts " Use case: #{@use_case}" puts "-" * 70 puts "" @@ -160,6 +220,7 @@ module Speak puts "#{i + 1}. #{model.name}" puts " Size: #{model.weight_gb.round(1)} GB | Quality: #{quality_stars} | Speed: #{speed_stars}" + puts " Context: #{model.max_context_k} tokens" puts " #{model.description}" puts " Use cases: #{model.use_cases.join(", ")}" puts " License: #{model.license} | Author: #{model.author}" @@ -184,39 +245,92 @@ module Speak end end - # Save selected model to config - def save_to_config(model : ModelInfo, config_path : String = "./speak/config.json") - unless File.exists?(config_path) - puts "Config file not found. Run speak once to generate it first." - return false + # Write selected model to active section of config.json + def write_model_to_config(model : ModelInfo) + unless File.exists?(CONFIG_PATH) + puts "Config file not found. Writing detected hardware first..." + write_detected_to_config end begin - config_data = File.read(config_path) + config_data = File.read(CONFIG_PATH) config = JSON.parse(config_data) - # Update active settings with selected model + # Calculate optimal context size based on available RAM + max_context_by_ram = ((@available_ram_gb - model.weight_gb) / model.kv_per_1k_gb * 1000).to_i + context_size = [model.max_context_k, max_context_by_ram].min + context_size = [context_size, 512].max + + # Update active section with selected model config.as_h["active"] = { - "context_size" => [model.max_context_k, 4096].min, + "cpu_cores" => @cpu_cores, + "has_avx2" => @has_avx2, + "free_disk_space_mb" => System.free_disk_space_mb("/"), + "context_size" => context_size, + "kv_cache_type" => "standard", "model_quant" => model.quantization, "model_file" => model.filename, "temperature" => 0.7, "max_tokens" => 512, - "use_mmap" => true, - "cpu_cores" => config.as_h["active"]["cpu_cores"], - "has_avx2" => config.as_h["active"]["has_avx2"], - "free_disk_space_mb" => config.as_h["active"]["free_disk_space_mb"], - "kv_cache_type" => "standard" + "use_mmap" => true } - File.write(config_path, config.to_pretty_json) - puts "\nModel saved to config: #{model.name}" - puts "You can now run ./speak" + File.write(CONFIG_PATH, config.to_pretty_json) + puts "\n Configuration saved to #{CONFIG_PATH}" + puts " Model: #{model.name}" + puts " Context size: #{context_size} tokens" + puts " File: #{model.filename}" + puts "\nYou can now run ./speak" return true rescue ex puts "Error updating config: #{ex.message}" return false end end + + # Full setup flow: detect hardware, let user pick model, write to config + def setup + puts "speak - First time setup" + puts "=" * 40 + puts "Detecting hardware..." + + # Write detected hardware to config + write_detected_to_config + + puts " Hardware detection complete" + puts "" + + # Let user pick a model + model = interactive_selection + if model + write_model_to_config(model) + return true + else + puts "Setup failed. No model selected." + return false + end + end + + # Quick setup: auto-select best model without user interaction + def auto_setup + puts "speak - Automatic setup" + puts "=" * 40 + puts "Detecting hardware..." + + write_detected_to_config + + puts " Hardware detection complete" + puts "Selecting best model for your system..." + + model = auto_select + if model + puts "Selected: #{model.name}" + write_model_to_config(model) + return true + else + puts "Setup failed. No compatible model found." + return false + end + end end end