Releases · mann1x/osync

13 Apr 01:32

mann1x

v1.3.0

3b8eaf8

osync v1.3.0 Latest

Latest

What's New

Code Quality & Reliability

Fixed HttpClient leaks in QcCommand and BenchCommand — now uses shared static instance instead of creating new clients per request
Fixed race conditions in BufferedPipeStream with proper locking
Fixed blocking async calls — converted to proper async/await patterns
Fixed resource disposal — added IDisposable to ChatSession and StreamHelpers
Extracted JudgeArgumentParser from Program.cs to reduce god-class complexity
Fixed LogFileWriter Spectre markup sanitization regex for correct bracket handling
Fixed ManageCommand TUI showing hardcoded v1.1.6 instead of actual app version
Single-file publish — binaries are now self-contained single executables

Comprehensive Test Suite (120 unit tests)

First proper test coverage for internal logic:

BenchScoring — scoring algorithms, formatting, statistics, rankings (45 tests)
QcScoring — weighted formula, judgment blending, edge cases (14 tests)
CloudJudgeProviderFactory — all 15 provider aliases, parsing, env vars (20 tests)
JudgeArgumentParser — local/remote/cloud URL parsing (11 tests)
LogFileWriter — ANSI escape and Spectre markup sanitization (14 tests)
BufferedPipeStream — async pipe stream with backpressure (8 tests)
ThrottledStream — bandwidth limiting and constructor validation (8 tests)

SpecFlow Integration Test Infrastructure

Base test model changed to tinyllama for faster test runs
Auto-download of test model in BeforeTestRun hook
AfterScenario cleanup for models created during tests
New step definitions: ServerSteps, ChatSteps
Feature files normalized to use {model} and {RemoteServer} variables
Stdin support in OsyncRunner for interactive test scenarios

Downloads

osync-windows-x64.exe — Windows x64 single-file executable
osync-linux-x64 — Linux x64 single-file executable (run chmod +x osync-linux-x64 after download)

Assets 4

18 Jan 20:12

mann1x

v1.2.9

0e83baa

v1.2.9

Real-Time Monitor Improvements - Enhanced monitoring dashboard
- Added -Hi shortcut for --history argument (e.g., osync monitor -Hi 10)
- Plain integers in --history are now treated as minutes (e.g., -Hi 10 = 10 minutes)
- Moved date/time from header to status bar (right-justified), reclaiming one line of vertical space
- Status bar now always pinned to bottom of terminal window
- Added osync version with build number in Ollama panel (e.g., osync v1.2.9 (b20260116-1814))
- Reduced screen flicker during refresh by overwriting content in place
- Fixed display artifacts when models load/unload (consistent table structure)
- Fixed graph scaling to properly fill the graph width based on history duration
- Time axis shows seconds when history < 5 minutes (avoids repeated minute labels)
- Fixed model expiration time calculation (was showing wrong time due to timezone handling)
- Ollama process metrics now aggregate ALL related processes (ollama serve, runners, llama-server)
- Shows process count in Ollama panel when multiple processes exist (e.g., "28 procs, 3.9 GB")
- Dynamic graph headers: Shows "Ollama" (blue/orange) when per-process data available, "System" (yellow/steel) when using system-wide fallback
- VRAM fallback: Uses Ollama API SizeVram when nvidia-smi can't report per-process VRAM (Windows/WDDM)
- GPU utilization fallback: Uses total GPU utilization when per-process utilization unavailable
- NvAPIWrapper integration (Windows only): Additional NVIDIA GPU data source as fallback
- D3DKMT per-process GPU monitoring (Windows only): Native Windows kernel API for accurate per-process GPU utilization and VRAM tracking, works with any GPU vendor (NVIDIA, AMD, Intel)
- Improved process name matching for Ollama detection (supports ollama_llama_server, llama-server, runner)
New Bench Command - Context tracking benchmark with dynamic story-based tests
- Generates stories with embedded facts across multiple categories
- Tests model's ability to track context through conversation
- Question types: New (current category facts), Old (retrieval from previous categories)
- Tool calling support for function execution tests
- Judge evaluation for answer quality assessment
- --enablethinking and --thinklevel arguments for thinking models (qwen3, deepseek-r1)
- --no-unloadall argument to skip unloading all models before testing
- --overwrite argument to skip file overwrite prompts
- --generate-suite to create custom test suite JSON files with -T and -O options
- Configurable context length, temperature, seed, and other parameters
- Progress bar during testing with timing statistics (Last/Avg/Max response times)
- Improved pull progress display with download speed and ETA
- Thinking token tracking: separate tracking for model thinking/reasoning tokens with verbose output
- Character consistency: story generator maintains consistent name-to-animal mapping across all chapters
- Optimized message flow: instructions and context combined with first question (avoids model confusion)
- Context length management: auto-detects model max context, configurable overhead (2K normal, 4K thinking)
- Two-phase HuggingFace rate limit retry: 50 quick retries (2s delay), then 50 slow retries (30s/API delay)
- Auto-backup: creates .backup.zip of existing results file before continuing (protects against data loss)
- --mode=parallel for parallel judgment - judges answers in background while testing continues
- Parallel mode dual progress bars: test progress bar shows test model metrics (Avg/Max time, p:/e: tok/s), judge progress bar shows judge model metrics separately
- Pre-flight check caching: thinking detection, context settings, and tools validation cached per model (SHA256 verified)
New BenchView Command - View and export context benchmark results
- Multiple output formats: table (console), json, md (markdown), html, pdf
- Category breakdown with accuracy percentages
- Question type analysis (New vs Old fact retrieval)
- Tool usage statistics when applicable
- --overwrite argument to skip file overwrite prompts
Increased num_predict Limits - Larger token limits for improved response quality
- Bench pre-flight check: 512 → 2048 (fixes issues with some models)
- Bench test responses: 2048 → 16384 (configurable via test suite numPredict)
- Bench judge: added 8192 limit (was missing)
- QC judge: 800 → 8192 (reduces truncated judge responses)
Bench Test Suite Configuration - New numPredict field in bench test suite JSON
- Controls maximum tokens generated per test response
- Default: 16384 tokens
- Can be customized per test suite for different use cases
Enhanced Process Status (ps) - Extended system monitoring
- Shows Ollama process CPU and memory usage when running locally
- GPU monitoring for NVIDIA cards (uses nvidia-smi): utilization, memory, temperature, power
- GPU monitoring for AMD cards (uses rocm-smi): utilization, memory, temperature, power
- Ollama-specific VRAM usage per GPU (shows percentage of total GPU memory)
- Color-coded output: green (0-50%), yellow (50-75%), orange (75-90%), red (90-100%)
- Temperature color coding: green (<60°C), yellow (60-70°C), orange (70-80°C), red (>80°C)
- Automatically detects available GPU monitoring tools
Load/Unload Command Improvements
- Load command now shows proper "Model not found" error instead of misleading connection error
- Both commands verify status using /api/ps after operation completes
- Better error messages for different HTTP status codes (404, 500, etc.)
CLI Improvements
- Shortened osync -h output to show only global options and available commands
- Use osync <command> -h for detailed help on specific commands
- Fixed ANSI color bleed on Linux/macOS - terminal no longer stays green after osync exits
- Explicit ANSI reset sequence (\x1b[0m) on exit prevents color leakage to shell prompt
Bug Fixes
- Fixed nvidia-smi parsing for power and memory values on systems with non-English locales
- Fixed GPU stats display to properly match ollama processes to their respective GPUs by UUID
- Fixed copy command not detecting IP addresses as remote servers (e.g., osync cp model 192.168.0.100)
- Fixed copy to remote destination requiring model name - now uses source model name when destination has no model
- Fixed HuggingFace model copy using source model name when destination has no explicit name
- Fixed qc and bench commands silently exiting when remote test server is unreachable - now shows clear "Could not connect to server" error message
- Fixed spurious ANSI escape characters (←[0m) appearing after command output on Windows
QC Command Updates
- --enablethinking and --thinklevel arguments for thinking models (qwen3, deepseek-r1)
- --no-unloadall argument to skip unloading all models before testing
- --overwrite argument to overwrite existing output file without prompting
- Fixed HttpClient timeout modification error ("This instance has already started one or more requests")
- Per-request timeout handling allows dynamic timeout extension during retries
- Improved pull progress display with download speed and ETA for --ondemand mode
- Fixed OutOfMemoryException during JSON serialization of large results (uses streaming)
- Fixed model preloading hanging issue - switched from /api/chat to lightweight /api/generate call
- Smart model loading: detects if model is already loaded, skips unload and just resets keep_alive timer
- Two-phase HuggingFace rate limit retry: 50 quick retries (2s delay), then 50 slow retries (30s/API delay)
- --fix argument to recover corrupted/malformed JSON results files (outputs to .fixed.json)
  - Multi-strategy recovery: structural analysis finds last valid QuestionResult, then rebuilds proper JSON closures
  - Handles corrupted closing sequences (e.g., missing array brackets, extra braces)
  - Reports recovery statistics: truncated arrays/objects, fixed closures, removed bytes
- Auto-backup: creates .backup.zip of existing results file before continuing (protects against data loss)
- Atomic file saves: write to temp file then rename, prevents corruption on cancellation
- Force exit (double Ctrl+C) now saves results before exiting
- Fixed Spectre.Console markup errors when loading corrupted JSON files (proper escape of exception messages)
- Fixed logprobs detection after model preload by using separate HTTP connections
QcView Command Updates
- --overwrite argument to overwrite existing output file without prompting
- Fixed OutOfMemoryException when loading large JSON results files (uses streaming deserialization)
BenchView Command Updates
- Fixed OutOfMemoryException when loading large JSON results files (uses streaming deserialization)
- Multiple results files comparison - Pass comma-separated files to compare different models
- Test suite digest validation ensures all files used identical test suite
- Default output filename computed from input filenames (e.g., file1-file2.html)
- Enhanced HTML output - qcview-style dark theme with toggle, collapsible sections
- Description field spans full width in header
- Q&A details always shown in HTML and PDF (no longer requires --details)
- Full answers, model thinking, judgment reasons, and tools used
- Subcategory table with two-row header (category spanning subcategories)
- Average speed per category table with response times
- PDF improvements - Header in table format with all metadata including versions
- All tables have proper borders, all three summary tables included
Bench Command Updates
- **Test s...

Assets 10

13 Jan 21:06

mann1x

v1.2.8

ae2995b

v1.2.8

Cloud Provider Support for Judge Models - Use cloud AI providers for --judge and --judgebest
- Support for 9 providers: Anthropic Claude, OpenAI, Google Gemini, Azure OpenAI, Mistral AI, Cohere, Together AI, HuggingFace, and Replicate
- Syntax: @provider[:token]/model (e.g., @claude/claude-sonnet-4-20250514, @openai/gpt-4o)
- API keys loaded from environment variables by default, can be specified explicitly
- Connection and model validation before testing starts
- Cloud provider info (name, API version) recorded in results for traceability
- QcView displays cloud provider badges in HTML/PDF output (only for cloud, not Ollama)
- New --help-cloud option for detailed provider documentation
PDF Text Rendering Fix - Fixed text corruption in Q&A answers for PDF output
- Resolved character scrambling issue with certain text patterns (e.g., Python format strings)
- Uses line-by-line Text elements to prevent text reordering
- Added Courier monospace font for code content for better readability
- Added text sanitization to handle problematic Unicode characters
QcView File Access Check - Moved file overwrite confirmation before progress bar
- Prevents concurrent display errors when output file already exists
- Applies to all output formats (JSON, Markdown, HTML, PDF)

Assets 10

12 Jan 23:00

mann1x

v1.2.7

e366b97

v1.2.7

Separate Best Answer Judge Model (--judgebest) - New command-line argument for best answer determination
- Use a different model for best answer judgment vs similarity scoring (--judge)
- --judgebest can be used alone or combined with --judge for different models
- Same configuration options as --judge: local model name or http://host:port/model for remote
- New system prompt focused purely on qualitative best answer determination
- Supports both serial and parallel execution modes
- Works with --rejudge to re-run only best answer judgment with new model
Version Tracking in QC Results - Record osync and Ollama versions in test results
- OsyncVersion - Version of osync used for testing
- OllamaVersion - Ollama server version for test quantizations
- OllamaJudgeVersion - Ollama version for judge server (similarity scoring)
- OllamaJudgeBestAnswerVersion - Ollama version for best answer judge server
- Versions captured automatically from Ollama /api/version endpoint
QCView Output Updates - All output formats updated with new information
- Table output shows Best Answer Judge model (when different from Judge) and versions
- JSON output includes all version fields and JudgeModelBestAnswer
- Markdown output includes Best Answer Judge and versions in header
- HTML output shows Best Answer Judge in info grid and versions row
- PDF output includes Best Answer Judge and versions in header tables
Manage TUI Multi-Select Delete Fix - Fixed batch delete for multiple selected models
- Multi-selection delete now works correctly (previously only deleted single model)
- Added batch confirmation dialog showing count and list of models to be deleted
Judge Retry Output Improvements - Better visibility into retry attempts during judgment
- Both judge and judgebest operations now show retry warnings with error codes at each attempt
- Displays retry delay countdown before each retry attempt
Fixed Copy to Remote Server - Resolved HTTP 500 errors when loading copied models
- Fixed stop parameter serialization (now correctly sent as array instead of string)
- Fixed numeric/boolean parameter type conversion (top_k, temperature, seed, etc.)
- New ConvertParameterValue helper ensures correct JSON types for all Ollama model parameters
Fixed HuggingFace Model Copy - Correct path resolution for hf.co/... models
- HuggingFace models now use correct manifest path (not under registry.ollama.ai)
- Fixed cross-platform path separator handling for model paths with forward slashes
Load/Unload URL Format Support - Both commands now accept URL format with embedded model name
- Supports osync load http://host:port/modelname in addition to osync load modelname -d host
- Same URL parsing for unload command
Fixed --rejudge Model Pulling - Rejudge mode no longer attempts to download test models
- When using --rejudge with existing results, only the judge model is needed
- Wildcard expansion now filters to only tags present in the results file
- Skips model verification for all existing results in rejudge mode
- Properly queues partial results for re-judgment without resuming tests

Assets 10

12 Jan 10:08

mann1x

v1.2.6

86609f4

v1.2.6

QcView Multiple Output Formats - Export results to various formats
- Markdown (.md) - Tables formatted for GitHub and documentation
- HTML (.html) - Interactive report with dark/light theme toggle, collapsible Q&A sections, color-coded scores
- PDF (.pdf) - Professional report using QuestPDF library with:
  - Summary tables with color-coded scores for all metrics
  - Scores by category table
  - Rankings tables (by score, eval speed, perplexity, best answers)
  - Full Q&A pages for each quantization with judgment details
- Use -Fo md, -Fo html, or -Fo pdf to select format
QcView Repository URL - New --repo argument to specify model source repository
- URL is displayed in output headers and included in JSON export
- Can be saved during qc testing and overridden in qcview
Headless/Background Mode Fix - Fixed console errors when running qc command in background
- Console.WindowWidth and Console.ReadKey now properly handled in headless environments
- Prevents "The handle is invalid" errors when running without a terminal
New Version Command - Added osync version (alias -v) to display version info
- Shows osync version number and build timestamp
- --verbose flag displays detailed info: binary path, installation status, shell type/version, tab completion status
- Detects bash, zsh, PowerShell (Core/Desktop), and cmd shells
- Smart installation detection: when running a different binary, compares version AND build timestamp with installed version
- Reports if installed version is older/newer (e.g., installed v1.2.6 (b20260110-1156) is older)
- Fixed tab completion detection to match actual script markers in profiles
Model Digest Tracking - QC results now include SHA256 digest for each tested model
- Full digest (Digest) and short digest (ShortDigest, first 12 chars) stored in results JSON
- Automatically populated from local Ollama or HuggingFace registry
- Backfill: missing digests are automatically retrieved when loading existing results files
Fixed Model ID Display - IDs now show first 12 chars of manifest SHA256 (matches ollama ls)
- osync ls and manage TUI compute SHA256 of manifest file content
- ID column width increased from 8 to 12 characters
- Consistent with ollama ls output for easy cross-reference
Improved osync ps Output - Dynamic console width and better model name display
- Detects console width and adjusts column sizes dynamically
- Model names now truncated from beginning to preserve full tag (e.g., ...0B-A3B-Instruct-GGUF:Q4_K_S)
- Better visibility of quantization tags for HuggingFace models with long paths
Load Command Timing - Shows elapsed time and API-reported load duration
- Displays total elapsed time and Ollama's load_duration from response
- Example: ✓ Model 'model:tag' loaded successfully (2m 15s) (API: 2m 5s)
QCView Table Alignment Fix - Tag and Quant columns now left-aligned instead of centered
Timeout Handling Improvements - Better handling of HTTP timeouts during testing
- Timeouts are now properly distinguished from user cancellation (no longer shows "Operation cancelled by user")
- Timeouts trigger retry with exponential backoff instead of immediate failure
- After retry attempts exhausted, prompts user: y=cancel, n=double timeout and retry
- Allows recovery from slow model responses without losing progress
Improved On-Demand Model Cleanup - Fixed critical bug where models were deleted during testing
- Models with incomplete test results are NEVER cleaned up (preserves for resume)
- Fixed cleanup to protect incomplete models regardless of error type (timeout, cancellation, etc.)
- On-demand status tracking is now consistent when resuming interrupted tests
Fixed HuggingFace Wildcard Tag Detection - Now detects all quantization formats
- Added support for XL variants (Q2_K_XL, Q3_K_XL, Q4_K_XL, Q5_K_XL, Q6_K_XL, Q8_K_XL)
- Added support for TQ ternary quantization (TQ1_0)
Fixed HuggingFace Model Quant Column - QC results now correctly show quantization type
- Ollama returns "quantization_level": "unknown" for HuggingFace models
- Now extracts quantization type from model name/tag when API returns "unknown"
Enhanced Quantization Display with Tensor Analysis - Quant column now shows dominant tensor quantization
- Analyzes transformer block weight tensors only (excludes embeddings, output, and norms)
- Calculates weighted percentage by tensor size (elements × bits per weight)
- Displays format like Q4_0 (87%) or Q6_K (81% Q8_0) showing actual tensor distribution
- Uses Ollama API verbose=true to fetch tensor metadata
- Fixed: Extract quant type from model name before tensor analysis for correct formatting
- Fixed: Filter to transformer weights only (Q8_0 embeddings/output were skewing results)
- Fixed: Unknown tensor types shown with "?" suffix (e.g., Q3_K?) to indicate uncertainty
- Supports all quantization types: Q*_K variants, IQ (importance matrix), and TQ (ternary)
Fixed QC Model Validation - Relaxed overly strict parameter size comparison
- Parameter size formatting varies between models (e.g., "999.89M" vs "1,000M" for same model)
- Now only warns on family mismatch instead of blocking testing
- Testing continues even with warnings
Improved Judge API Retry Strategy - More resilient handling of judge server errors
- Increased retry attempts from 5 to 25 for judge API calls
- Delay ramps from 5 seconds to 30 seconds progressively
- Shows warning and skips judgment only after all retries exhausted (instead of failing)
- Better handles overloaded or slow judge servers (HTTP 500 errors)
Fixed Base Model Re-Pull When Adding Quants - Skip base model if results already exist
- When adding new quants to existing test results without -b, no longer tries to pull the base model
- If results file contains any base model results (even partial), the base is skipped entirely
- Improved base model detection: automatically identifies base by common patterns (fp16, f16, bf16, etc.)
- Use --force to re-run the base model if needed
Improved osync ls Wildcard Handling - Better shell expansion handling on Linux/macOS
- Default behavior: osync ls code matches models starting with "code" (prefix match, same as code*)
- Suffix match: osync ls *q4_k_m finds all models ending with "q4_k_m" (useful for finding by quantization)
- Contains match: osync ls *code* finds models containing "code" anywhere in the name
- Shell expansion handling: detects when shell expanded unquoted wildcards and shows helpful warning
- Suggests using quotes to prevent expansion: osync ls 'gemma*'
Wildcard Tag Expansion for osync pull - Pull multiple models with tag patterns
- Supports wildcards in tags: osync pull gemma3:1b-it-q* pulls all matching tags
- Works with HuggingFace: osync pull hf.co/unsloth/gemma-3-1b-it-GGUF:IQ2*
- Works with remote servers: osync pull -d http://server:11434 gemma3:1b-it-q*
- Automatically resolves available tags from Ollama registry or HuggingFace API
- Shows list of matching tags before pulling
Judge Best Answer Tracking - QC judge now evaluates which response is qualitatively better
- Judge model returns bestanswer: A (base better), B (quant better), or AB (tie)
- Verbose output shows best answer for each judgment: Score: 75% (27/50 54%) Best: AB
- Handles edge cases: normalizes various formats (ResponseA, Response_A, Tie, identical, etc.)
- Results automatically re-judged if --judge is active and bestanswer is missing
QcView Judge Best Column - New column showing quant win statistics
- Format: 67% (B:10 A:5 =:3) showing quant won 67% of non-tie comparisons
- B = quant better, A = base better, = = tie
- Best percentage excludes ties (only counts decisive wins/losses)
- Color-coded: green (>=60%), yellow (40-60%), red (<40%)
Enhanced JSON Output - Additional statistics in JSON export
- Per-question BestAnswer field (A/B/AB)
- Per-quantization: BestCount, WorstCount, TieCount, BestPercentage, WorstPercentage, TiePercentage
- Per-category: CategoryBestStats with counts and percentages
QcView Metrics-Only Mode - New --metricsonly argument to ignore judgment data
- Shows only metrics-based scores (token similarity, logprobs divergence, perplexity, length consistency)
- Useful for comparing pure model output quality without judge influence
- Works with all output formats (table, json, md, html, pdf)
Automatic Judge Context Length - Judge model context is now auto-calculated by default
- When --judge-ctxsize is 0 (new default), calculates: test_ctx × 2 + 2048
- Ensures judge has enough context for both base and quantized responses plus prompt
- Can still be manually overridden with explicit value
PDF Generation Progress Bar - Visual progress indicator when generating PDF reports
- Shows progress through Q&A pages for each quantization
- Useful for large test results files with many questions
PDF Layout Improvements - Better page break handling in PDF reports
- Ranking tables use ShowEntire() to prevent splitting across pages
- Speed columns simplified to show only percentage (removed tok/s to prevent wrapping)
- Category scores section moves entirely to next page if it won't fit
- Rankings organized into paired rows (Final Score + Eval Speed, Perplexity + Prompt Speed, Best Answers)
- Added Prompt Speed ranking table with vs Base percentage column
Manage TUI Batch Delete Fix - Fixed multi-selection delete not working
- Delete now properly handles multiple selected models (Ctrl+D with checkmarks)
- Confirmation dialog shows ...

Assets 10

09 Jan 17:14

mann1x

v1.2.4

04dacf8

v1.2.4

New --rejudge Argument for QC Command - Re-run judgment without re-testing
- New --rejudge argument to re-run judgment process for existing test results
- Unlike --force which re-runs both testing and judgment, --rejudge only re-runs judgment
- Useful for re-evaluating results with a different judge model or updated prompts
Improved Judge Response Parsing - More robust handling of judge model responses
- Case-insensitive JSON property matching for Reason/reason/REASON fields
- Multiple regex patterns with increasing leniency for fallback parsing
- Truncated JSON repair to handle incomplete responses from models
- Increased num_predict from 200 to 800 to reduce truncated responses
- Full raw JSON output displayed when reason parsing fails (for debugging)
Improved Judge Scoring Accuracy - Fixed score interpretation issues
- Changed JSON schema score type from "integer" to "number" for better model compatibility
- Added explicit prompt instructions for 1-100 integer scoring (not 0.0-1.0 decimal)
- Score normalization to handle both 0.0-1.0 and 1-100 ranges from different models
Fixed HuggingFace Model Verification in On-Demand Mode - Registry check now supports HuggingFace models
- On-demand mode (--ondemand) now properly verifies HuggingFace models (hf.co/...)
- Checks HuggingFace API to verify repository and GGUF file existence
- Supports various GGUF filename patterns for tag matching
Fixed Base Model Name Handling - Full model names now preserved for base tag
- Base model specified as full model name (e.g., -b qwen3-coder:30b-a3b-fp16 or -b hf.co/namespace/repo:tag) is now used as-is
- Previously, only the tag portion was extracted and combined with -M model name
Wildcard Tag Selection for QC Command - Dynamically select quantizations with patterns
- Support for wildcard patterns (*) in -Q argument (e.g., Q4*, IQ*, *)
- Fetches available tags from HuggingFace API for hf.co/... models
- Scrapes available tags from Ollama website for Ollama registry models
- Case-insensitive pattern matching
- New ModelTagResolver class for reusable tag resolution across commands
Improved On-Demand Cleanup - Models pulled on-demand are now properly cleaned up on failure
- On-demand models are removed when testing fails or is cancelled
- Cleanup happens in exception handlers to ensure no orphaned models remain
- Models tracked at class level for reliable cleanup across failures
- Preload failures now also trigger cleanup of on-demand models
Improved Model Preload - Better error handling and retry logic for model loading
- Added retry logic (3 attempts with exponential backoff) for transient failures
- Shows actual error message when preload fails (HTTP status, error details)
- Uses configurable timeout (--timeout) for model loading
- Handles timeout, connection errors, and server errors gracefully
Fixed Model Name Case Sensitivity - Handle Ollama storing HuggingFace tags with different case
- After pulling, resolves actual model name stored by Ollama (case-insensitive lookup)
- Fixes issue where Q4_0 is stored as q4_0 causing preload to fail

Assets 10

09 Jan 07:28

mann1x

v1.2.3

5009638

v1.2.3

On-Demand Mode for QC Command - Pull models automatically and remove after testing
- New --ondemand argument to enable on-demand model management
- Models missing from the Ollama instance are automatically pulled from the registry
- Models that were already present are NOT removed (only on-demand pulled models)
- After testing and judgment complete, on-demand models are removed to free disk space
- State is persisted in results file for proper cleanup on resume
- Works with both local and remote Ollama servers
- Ideal for testing large models or many quantizations without consuming permanent storage
Context Length Support for QC Command - Configure context length (num_ctx) for testing and judgment
- Default test context length: 4096, default judge context length: 12288
- Suite-level contextLength property in built-in test suites (v1base, v1quick, v1code)
- External JSON test suites support contextLength at suite, category, and question levels
- Hierarchical override system: question > category > suite
- Console output displays context length at startup and when overridden
- New --judge-ctxsize argument to configure judge model context length
Improved Console Output - Context length settings displayed during test execution
- Shows test and judge context lengths at the beginning of testing
- Displays override notifications when context length changes (e.g., "Context length changed to 8192 (from category Code)")
Fixed Linux Terminal Display Issues - Resolved ANSI color rendering problems
- Fixed white box display issue in interactive REPL mode on Linux terminals
- Downgraded PrettyConsole (3.1.0 → 2.0.0) and Spectre.Console (0.54.0 → 0.49.1) for compatibility
Improved Installer - Streamlined installation process
- Installer now only copies the main executable (no longer copies all directory files)
- Added mechanism for platform-specific optional dependencies
- Removed unnecessary libuv.dylib dependency for macOS (not needed in .NET 8+)
Fixed Bash Completion on Linux
- Fixed tab completion for model names containing colons (e.g., osync ls qwen2:)
- Fixed file tab completion for qcview command

Assets 10

03 Jan 16:58

mann1x

v1.2.2

d600123

v1.2.2

Improved Judgment Prompt Format - Better compatibility with more models
- Instructions now in both system prompt and user message for redundancy
- Clear text markers for RESPONSE A and RESPONSE B instead of JSON encoding
- Question included for context with clear delimiters
- Explicit rules to prevent models from judging quality/correctness instead of similarity
Verbose Judgment Output - New --verbose flag to show judgment details
- Displays question ID, score (color-coded), and first 4 lines of reason
- Works with both serial and parallel judgment modes
- Helps debug and understand judge model scoring
Fixed Parallel Verbose Output - Verbose output now displays during parallel judgment execution
- Previously showed all results after completion; now shows each result as it completes
Fixed Serial Verbose Progress - Progress bar now displays alongside verbose output in serial mode
Improved Cancellation Handling - Ctrl+C now immediately stops judgment without retrying
- Cancellation exceptions are no longer retried 5 times
- Judgment loop checks for cancellation before each question
Missing Reason Retry - Judge API retries up to 5 times when response contains score but no reason
- Warning displayed if reason still missing after all retries

Assets 10

03 Jan 01:13

mann1x

v1.2.1

cd6e7e7

v1.2.1

Bug Fix: Base Model Detection - Fixed issue where base model wasn't correctly identified when using full model names
- Base tag is now properly normalized when specified with full path (e.g., user/model:tag)
- Existing results files with missing IsBase flag are automatically repaired on load
- Judgment now correctly runs for quantizations that need it
Bug Fix: Output Filename Sanitization - Model names with / or \ are now converted to - in default output filename
- Prevents file path issues when model name contains directory separators
Improved Startup Output - Output file path is now displayed early in the execution
- Shows right after loading test suite, before judge model verification
Cancellation Improvements - Better handling of Ctrl+C during API calls
- Cancellation token now passed to HTTP requests for immediate cancellation
- Wrapped cancellation exceptions are properly detected

Assets 10

01 Jan 12:13

mann1x

v1.2.0

a4206ff

v1.2.0

Coding Test Suite - New v1code test suite for evaluating code generation quality
- 50 challenging coding questions across 5 languages: Python, C++, C#, TypeScript, Rust
- Double token output limit (8192) for longer code responses
- Questions include instruction to limit response size
- Available as -T v1code or via external v1code.json file
Configurable Token Output - Test suites now support custom numPredict values
- Each test suite can specify its own maximum token output
- External JSON test suites support numPredict property (default: 4096)
- Displayed in test suite info when non-default value is used
Improved Model Existence Check - Pull command now uses Ollama registry API for faster, more reliable model validation
- Uses registry.ollama.ai/v2/ manifest endpoint instead of HTML scraping
- Properly handles both library models and user models
- Faster response times and more accurate error messages
True Independent Parallel Judgment - Testing continues to next quantization while judgment runs in background
- Testing no longer waits for judgment to complete before moving to next quantization
- Background judgment tasks tracked and awaited at the end with progress display
- Progress bars show real-time status for both testing and judgment
Improved Progress Display - Better visibility into parallel operations
- Dual progress bars during testing (Testing + Judging) in parallel mode
- Background judgment status shown after each quantization completes
- Final wait screen shows progress for all pending judgment tasks
Configurable API Timeout - Added --timeout argument for testing and judgment API calls
- Default increased from 300 to 600 seconds for longer code generation
- Configurable via --timeout <seconds> argument
- Applies to both test model and judge model API calls
Resume Support - Gracefully handle interruptions and resume from where you left off
- Press Ctrl+C to save partial results and exit cleanly
- Re-run the same command to resume testing from the last saved question
- Partial quantization results are preserved in the JSON file
- Progress bar shows resumed position when continuing
- Missing judgments are automatically detected and re-run on resume
UI Improvements
- Unified color scheme: lime for good scores (80%+) and performance above 100%
- Orange color for performance below 100%

Assets 10

Releases: mann1x/osync

osync v1.3.0

What's New

Code Quality & Reliability

Comprehensive Test Suite (120 unit tests)

SpecFlow Integration Test Infrastructure

Downloads

Uh oh!

v1.2.9

Uh oh!

v1.2.8

Uh oh!

v1.2.7

Uh oh!

v1.2.6

Uh oh!

v1.2.4

Uh oh!

v1.2.3

Uh oh!

v1.2.2

Uh oh!

v1.2.1

Uh oh!

v1.2.0

Uh oh!