Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# CLAUDE.md

LlmClassifier - Ruby gem for building LLM-powered classifiers with a clean DSL. Supports multiple LLM backends (ruby_llm, OpenAI, Anthropic) and optional Rails integration.

- Ruby >= 3.2, RSpec, RuboCop, Zeitwerk autoloading
- No Rails dependency in core; Rails integration is opt-in via `lib/llm_classifier/rails/`
- CI tests against Ruby 3.4 and 4.0

## Development with Docker

Ruby is not installed on the host. Use Docker to run tests and linting:

```bash
# Run tests and rubocop (Ruby 3.4)
docker.exe run --rm -v "$(wslpath -w "$(pwd)"):/app" -w /app ruby:3.4-slim \
bash -c "apt-get update -qq && apt-get install -y -qq build-essential git 2>/dev/null && \
gem install bundler --no-document && bundle install --quiet && \
bundle exec rspec && bundle exec rubocop"

# Rubocop only
docker.exe run --rm -v "$(wslpath -w "$(pwd)"):/app" -w /app ruby:3.4-slim \
bash -c "apt-get update -qq && apt-get install -y -qq build-essential git 2>/dev/null && \
gem install bundler --no-document && bundle install --quiet && \
bundle exec rubocop"

# Single spec file
docker.exe run --rm -v "$(wslpath -w "$(pwd)"):/app" -w /app ruby:3.4-slim \
bash -c "apt-get update -qq && apt-get install -y -qq build-essential git 2>/dev/null && \
gem install bundler --no-document && bundle install --quiet && \
bundle exec rspec spec/llm_classifier/classifier_spec.rb"
```

Docker Desktop must be running on Windows. The `docker.exe` command is used because Docker runs via WSL2 integration. The `-v` flag bind-mounts the project so edits on host are immediately visible.

A `.devcontainer/` setup also exists for VS Code Dev Containers.

## Quick Commands (inside Docker)

```bash
bundle exec rspec # all tests
bundle exec rspec spec/llm_classifier/classifier_spec.rb # single file
bundle exec rubocop # all files
bundle exec rubocop -a # auto-correct
gem build llm_classifier.gemspec # build gem
```

## Project Structure

All sibling projects are located in `/home/axium/projects/`. The `prospector` gem depends on `llm_classifier`.

## Code Standards

- Double-quoted strings (enforced by RuboCop)
- Max line length: 120 characters
- Max method length: 20 lines
- RSpec example max: 15 lines, max 6 expectations per example
- `Style/HashExcept` disabled (requires ActiveSupport)
- `Metrics/ClassLength` exempted for `classifier.rb` and `content_fetchers/web.rb`

## Git Workflow

- Never push directly to main. Always create a feature branch and PR.
- Run the full test suite and rubocop before creating a PR.
- Version bumps in `lib/llm_classifier/version.rb` go in the feature PR, not separately.

## Key Classes

- `LlmClassifier::Classifier` - Core DSL and classification pipeline
- `LlmClassifier::Result` - Value object returned from every classification
- `LlmClassifier::Knowledge` - Domain knowledge DSL container (`method_missing`-based)
- `LlmClassifier::Configuration` - Global config (adapter, model, API keys)
- `LlmClassifier::Adapters::Base` - Abstract adapter interface
- `LlmClassifier::ContentFetchers::Web` - HTTP fetcher with SSRF protection
- `LlmClassifier::Rails::Concerns::Classifiable` - ActiveRecord integration

## Component Documentation

- [lib/llm_classifier/adapters/CLAUDE.md](lib/llm_classifier/adapters/CLAUDE.md) - LLM adapter contract and implementations
- [lib/llm_classifier/content_fetchers/CLAUDE.md](lib/llm_classifier/content_fetchers/CLAUDE.md) - Content fetchers and SSRF protection
- [lib/llm_classifier/rails/CLAUDE.md](lib/llm_classifier/rails/CLAUDE.md) - Rails integration (Zeitwerk-excluded)
- [spec/CLAUDE.md](spec/CLAUDE.md) - Testing conventions
22 changes: 22 additions & 0 deletions lib/llm_classifier/adapters/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Adapters

LLM provider adapters. All inherit from `Adapters::Base` and implement `#chat(model:, system_prompt:, user_prompt:)`.

## Inventory

- `Base` - Abstract interface. Provides `#config` helper for accessing `LlmClassifier.configuration`
- `RubyLlm` - Delegates to the `ruby_llm` gem. Returns a Hash with `:content`, `:input_tokens`, `:output_tokens`
- `OpenAI` - Direct `Net::HTTP` POST to OpenAI API. Returns a String
- `Anthropic` - Direct `Net::HTTP` POST to Anthropic API. Returns a String

## Conventions

- `#chat` returns either a String (raw content) or a Hash with `:content` plus optional token metadata
- `Classifier#extract_response_data` handles both return types, so new adapters can use either
- API keys come from `LlmClassifier.configuration`, not hardcoded or passed as arguments
- Custom adapters can be a class instance passed directly to `config.adapter`

## Related

- [../content_fetchers/CLAUDE.md](../content_fetchers/CLAUDE.md) - Content fetchers
- [../../spec/CLAUDE.md](../../spec/CLAUDE.md) - Testing conventions
27 changes: 27 additions & 0 deletions lib/llm_classifier/content_fetchers/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Content Fetchers

Utilities for fetching external content to use as classification input. Not wired into `Classifier` automatically -- callers fetch content and pass it in.

## Inventory

- `Base` - Abstract interface. Subclasses implement `#fetch(source)`
- `Web` - HTTP fetcher with SSRF protection, redirect following, and HTML text extraction
- `Null` - No-op fetcher, always returns `nil`

## SSRF Protection (`Web`)

- Validates resolved IPs against private/loopback CIDR ranges before connecting
- Follows up to 3 redirects, re-validating each redirect target
- `normalize_redirect_url` handles relative and absolute redirect URLs
- Uses `nil? || empty?` guards (not ActiveSupport `.blank?`) to avoid the dependency

## HTML Processing (`Web`)

- Nokogiri is lazily loaded (`require "nokogiri"` inside the method) since it's an optional dependency
- Strips `<script>`, `<style>`, `<nav>`, `<footer>`, `<header>` elements
- Truncates extracted text to 2000 characters

## Related

- [../adapters/CLAUDE.md](../adapters/CLAUDE.md) - LLM adapters
- [../../spec/CLAUDE.md](../../spec/CLAUDE.md) - Testing conventions
25 changes: 25 additions & 0 deletions lib/llm_classifier/rails/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Rails Integration

This entire subtree is excluded from Zeitwerk autoloading (`loader.ignore` in `lib/llm_classifier.rb`) and loaded manually only when `Rails::Railtie` is defined. This keeps Rails as an optional dependency.

## Inventory

- `Railtie` - Sets `Rails.logger` as the default LlmClassifier logger
- `Concerns::Classifiable` - ActiveRecord concern adding a `classifies` macro
- `Generators::InstallGenerator` - `rails g llm_classifier:install` scaffolds an initializer
- `Generators::ClassifierGenerator` - `rails g llm_classifier:classifier Name cat1 cat2` scaffolds a classifier and spec

## Classifiable Concern

The `classifies` macro defines three instance methods per classification:

- `classify_<attr>!` - Runs classification and stores the result
- `<attr>_category` / `<attr>_categories` - Reads stored category data
- `<attr>_classification` - Returns the full stored classification hash

Results are written into a JSONB column (via `store_in:`) or a transient instance variable if no column is specified.

## Related

- [../adapters/CLAUDE.md](../adapters/CLAUDE.md) - LLM adapters
- [../content_fetchers/CLAUDE.md](../content_fetchers/CLAUDE.md) - Content fetchers
34 changes: 34 additions & 0 deletions spec/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Testing Conventions

RSpec 3.x with WebMock and VCR.

## Setup

- All external HTTP is blocked by WebMock in tests
- VCR cassettes go in `spec/fixtures/vcr_cassettes/` (tag examples with `:vcr` for auto-recording)
- API keys are filtered from cassettes (`<OPENAI_API_KEY>`, `<ANTHROPIC_API_KEY>`)
- `LlmClassifier.reset_configuration!` runs before every example to isolate global state
- `verify_partial_doubles: true` enforces strict mocking

## Running Tests

```bash
bundle exec rspec # all tests
bundle exec rspec spec/llm_classifier/classifier_spec.rb # single file
bundle exec rspec --only-failures # re-run failures
```

## Spec Files

- `llm_classifier_spec.rb` - Module-level configure/reset
- `llm_classifier/classifier_spec.rb` - DSL attributes, `.classify` pipeline, token data, code-fence stripping, callbacks
- `llm_classifier/result_spec.rb` - Result factory methods, `#to_h`, `#multi_label?`, token fields
- `llm_classifier/knowledge_spec.rb` - Dynamic storage, `#to_prompt` formatting

No specs exist yet for: adapters, content fetchers, the Rails concern, or generators.

## Related

- [../lib/llm_classifier/adapters/CLAUDE.md](../lib/llm_classifier/adapters/CLAUDE.md) - LLM adapters
- [../lib/llm_classifier/content_fetchers/CLAUDE.md](../lib/llm_classifier/content_fetchers/CLAUDE.md) - Content fetchers
- [../lib/llm_classifier/rails/CLAUDE.md](../lib/llm_classifier/rails/CLAUDE.md) - Rails integration
Loading