AxiumFoundry · DmitrySychev · Apr 5, 2026 · Apr 5, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,81 @@
+# CLAUDE.md
+
+LlmClassifier - Ruby gem for building LLM-powered classifiers with a clean DSL. Supports multiple LLM backends (ruby_llm, OpenAI, Anthropic) and optional Rails integration.
+
+- Ruby >= 3.2, RSpec, RuboCop, Zeitwerk autoloading
+- No Rails dependency in core; Rails integration is opt-in via `lib/llm_classifier/rails/`
+- CI tests against Ruby 3.4 and 4.0
+
+## Development with Docker
+
+Ruby is not installed on the host. Use Docker to run tests and linting:
+
+```bash
+# Run tests and rubocop (Ruby 3.4)
+docker.exe run --rm -v "$(wslpath -w "$(pwd)"):/app" -w /app ruby:3.4-slim \
+  bash -c "apt-get update -qq && apt-get install -y -qq build-essential git 2>/dev/null && \
+  gem install bundler --no-document && bundle install --quiet && \
+  bundle exec rspec && bundle exec rubocop"
+
+# Rubocop only
+docker.exe run --rm -v "$(wslpath -w "$(pwd)"):/app" -w /app ruby:3.4-slim \
+  bash -c "apt-get update -qq && apt-get install -y -qq build-essential git 2>/dev/null && \
+  gem install bundler --no-document && bundle install --quiet && \
+  bundle exec rubocop"
+
+# Single spec file
+docker.exe run --rm -v "$(wslpath -w "$(pwd)"):/app" -w /app ruby:3.4-slim \
+  bash -c "apt-get update -qq && apt-get install -y -qq build-essential git 2>/dev/null && \
+  gem install bundler --no-document && bundle install --quiet && \
+  bundle exec rspec spec/llm_classifier/classifier_spec.rb"
+```
+
+Docker Desktop must be running on Windows. The `docker.exe` command is used because Docker runs via WSL2 integration. The `-v` flag bind-mounts the project so edits on host are immediately visible.
+
+A `.devcontainer/` setup also exists for VS Code Dev Containers.
+
+## Quick Commands (inside Docker)
+
+```bash
+bundle exec rspec                                    # all tests
+bundle exec rspec spec/llm_classifier/classifier_spec.rb  # single file
+bundle exec rubocop                                  # all files
+bundle exec rubocop -a                               # auto-correct
+gem build llm_classifier.gemspec                     # build gem
+```
+
+## Project Structure
+
+All sibling projects are located in `/home/axium/projects/`. The `prospector` gem depends on `llm_classifier`.
+
+## Code Standards
+
+- Double-quoted strings (enforced by RuboCop)
+- Max line length: 120 characters
+- Max method length: 20 lines
+- RSpec example max: 15 lines, max 6 expectations per example
+- `Style/HashExcept` disabled (requires ActiveSupport)
+- `Metrics/ClassLength` exempted for `classifier.rb` and `content_fetchers/web.rb`
+
+## Git Workflow
+
+- Never push directly to main. Always create a feature branch and PR.
+- Run the full test suite and rubocop before creating a PR.
+- Version bumps in `lib/llm_classifier/version.rb` go in the feature PR, not separately.
+
+## Key Classes
+
+- `LlmClassifier::Classifier` - Core DSL and classification pipeline
+- `LlmClassifier::Result` - Value object returned from every classification
+- `LlmClassifier::Knowledge` - Domain knowledge DSL container (`method_missing`-based)
+- `LlmClassifier::Configuration` - Global config (adapter, model, API keys)
+- `LlmClassifier::Adapters::Base` - Abstract adapter interface
+- `LlmClassifier::ContentFetchers::Web` - HTTP fetcher with SSRF protection
+- `LlmClassifier::Rails::Concerns::Classifiable` - ActiveRecord integration
+
+## Component Documentation
+
+- [lib/llm_classifier/adapters/CLAUDE.md](lib/llm_classifier/adapters/CLAUDE.md) - LLM adapter contract and implementations
+- [lib/llm_classifier/content_fetchers/CLAUDE.md](lib/llm_classifier/content_fetchers/CLAUDE.md) - Content fetchers and SSRF protection
+- [lib/llm_classifier/rails/CLAUDE.md](lib/llm_classifier/rails/CLAUDE.md) - Rails integration (Zeitwerk-excluded)
+- [spec/CLAUDE.md](spec/CLAUDE.md) - Testing conventions
diff --git a/lib/llm_classifier/adapters/CLAUDE.md b/lib/llm_classifier/adapters/CLAUDE.md
@@ -0,0 +1,22 @@
+# Adapters
+
+LLM provider adapters. All inherit from `Adapters::Base` and implement `#chat(model:, system_prompt:, user_prompt:)`.
+
+## Inventory
+
+- `Base` - Abstract interface. Provides `#config` helper for accessing `LlmClassifier.configuration`
+- `RubyLlm` - Delegates to the `ruby_llm` gem. Returns a Hash with `:content`, `:input_tokens`, `:output_tokens`
+- `OpenAI` - Direct `Net::HTTP` POST to OpenAI API. Returns a String
+- `Anthropic` - Direct `Net::HTTP` POST to Anthropic API. Returns a String
+
+## Conventions
+
+- `#chat` returns either a String (raw content) or a Hash with `:content` plus optional token metadata
+- `Classifier#extract_response_data` handles both return types, so new adapters can use either
+- API keys come from `LlmClassifier.configuration`, not hardcoded or passed as arguments
+- Custom adapters can be a class instance passed directly to `config.adapter`
+
+## Related
+
+- [../content_fetchers/CLAUDE.md](../content_fetchers/CLAUDE.md) - Content fetchers
+- [../../spec/CLAUDE.md](../../spec/CLAUDE.md) - Testing conventions
diff --git a/lib/llm_classifier/content_fetchers/CLAUDE.md b/lib/llm_classifier/content_fetchers/CLAUDE.md
@@ -0,0 +1,27 @@
+# Content Fetchers
+
+Utilities for fetching external content to use as classification input. Not wired into `Classifier` automatically -- callers fetch content and pass it in.
+
+## Inventory
+
+- `Base` - Abstract interface. Subclasses implement `#fetch(source)`
+- `Web` - HTTP fetcher with SSRF protection, redirect following, and HTML text extraction
+- `Null` - No-op fetcher, always returns `nil`
+
+## SSRF Protection (`Web`)
+
+- Validates resolved IPs against private/loopback CIDR ranges before connecting
+- Follows up to 3 redirects, re-validating each redirect target
+- `normalize_redirect_url` handles relative and absolute redirect URLs
+- Uses `nil? || empty?` guards (not ActiveSupport `.blank?`) to avoid the dependency
+
+## HTML Processing (`Web`)
+
+- Nokogiri is lazily loaded (`require "nokogiri"` inside the method) since it's an optional dependency
+- Strips `<script>`, `<style>`, `<nav>`, `<footer>`, `<header>` elements
+- Truncates extracted text to 2000 characters
+
+## Related
+
+- [../adapters/CLAUDE.md](../adapters/CLAUDE.md) - LLM adapters
+- [../../spec/CLAUDE.md](../../spec/CLAUDE.md) - Testing conventions
diff --git a/lib/llm_classifier/rails/CLAUDE.md b/lib/llm_classifier/rails/CLAUDE.md
@@ -0,0 +1,25 @@
+# Rails Integration
+
+This entire subtree is excluded from Zeitwerk autoloading (`loader.ignore` in `lib/llm_classifier.rb`) and loaded manually only when `Rails::Railtie` is defined. This keeps Rails as an optional dependency.
+
+## Inventory
+
+- `Railtie` - Sets `Rails.logger` as the default LlmClassifier logger
+- `Concerns::Classifiable` - ActiveRecord concern adding a `classifies` macro
+- `Generators::InstallGenerator` - `rails g llm_classifier:install` scaffolds an initializer
+- `Generators::ClassifierGenerator` - `rails g llm_classifier:classifier Name cat1 cat2` scaffolds a classifier and spec
+
+## Classifiable Concern
+
+The `classifies` macro defines three instance methods per classification:
+
+- `classify_<attr>!` - Runs classification and stores the result
+- `<attr>_category` / `<attr>_categories` - Reads stored category data
+- `<attr>_classification` - Returns the full stored classification hash
+
+Results are written into a JSONB column (via `store_in:`) or a transient instance variable if no column is specified.
+
+## Related
+
+- [../adapters/CLAUDE.md](../adapters/CLAUDE.md) - LLM adapters
+- [../content_fetchers/CLAUDE.md](../content_fetchers/CLAUDE.md) - Content fetchers
diff --git a/spec/CLAUDE.md b/spec/CLAUDE.md
@@ -0,0 +1,34 @@
+# Testing Conventions
+
+RSpec 3.x with WebMock and VCR.
+
+## Setup
+
+- All external HTTP is blocked by WebMock in tests
+- VCR cassettes go in `spec/fixtures/vcr_cassettes/` (tag examples with `:vcr` for auto-recording)
+- API keys are filtered from cassettes (`<OPENAI_API_KEY>`, `<ANTHROPIC_API_KEY>`)
+- `LlmClassifier.reset_configuration!` runs before every example to isolate global state
+- `verify_partial_doubles: true` enforces strict mocking
+
+## Running Tests
+
+```bash
+bundle exec rspec                                          # all tests
+bundle exec rspec spec/llm_classifier/classifier_spec.rb   # single file
+bundle exec rspec --only-failures                           # re-run failures
+```
+
+## Spec Files
+
+- `llm_classifier_spec.rb` - Module-level configure/reset
+- `llm_classifier/classifier_spec.rb` - DSL attributes, `.classify` pipeline, token data, code-fence stripping, callbacks
+- `llm_classifier/result_spec.rb` - Result factory methods, `#to_h`, `#multi_label?`, token fields
+- `llm_classifier/knowledge_spec.rb` - Dynamic storage, `#to_prompt` formatting
+
+No specs exist yet for: adapters, content fetchers, the Rails concern, or generators.
+
+## Related
+
+- [../lib/llm_classifier/adapters/CLAUDE.md](../lib/llm_classifier/adapters/CLAUDE.md) - LLM adapters
+- [../lib/llm_classifier/content_fetchers/CLAUDE.md](../lib/llm_classifier/content_fetchers/CLAUDE.md) - Content fetchers
+- [../lib/llm_classifier/rails/CLAUDE.md](../lib/llm_classifier/rails/CLAUDE.md) - Rails integration