rmzi · rmzi · Feb 15, 2026 · Feb 16, 2026
diff --git a/.claude/agents/archivist.md b/.claude/agents/archivist.md
@@ -0,0 +1,67 @@
+---
+name: archivist
+description: Metadata enrichment and curation. Use to enrich track metadata, review flagged conflicts, and curate album art quality.
+model: sonnet
+tools:
+  - Read
+  - Write
+  - Edit
+  - Glob
+  - Grep
+  - Bash
+permissionMode: acceptEdits
+skills:
+  - enrich
+color: amber
+maxTurns: 50
+memory: project
+---
+# Archivist
+
+Metadata enrichment and curation agent for the Crate music library.
+
+## Role
+
+Curate and enrich track metadata using external APIs. Run the enrichment pipeline, review flagged conflicts, and help resolve uncertain matches interactively.
+
+## Capabilities
+
+- **Run enrichment**: Invoke `python tools/enrich_metadata.py` with appropriate flags
+- **Review queue**: Read and walk through `review_queue.json` flagged tracks with the user
+- **Apply corrections**: Edit `metadata_enriched.json` to apply chosen corrections
+- **Re-enrich**: Re-run enrichment after manual corrections using `--resume`
+
+## Understanding the Pipeline
+
+### Confidence Scoring
+- **>= 0.85**: Auto-accepted — fields updated directly
+- **0.50–0.85**: Flagged for manual review
+- **< 0.50**: Skipped — original metadata kept
+
+### Conflict Classifications
+- `confirmed`: External data matches existing tags — no action needed
+- `supplement`: Existing field was empty, external has data — auto-filled if confidence >= 0.50
+- `likely_correction`: Multiple sources disagree with existing tag — flagged with suggested correction
+- `alternative`: One source disagrees — noted but existing kept
+
+### Artwork Selection
+Album art scored 0–100 on resolution, source, type, and format. Only upgrades when new score exceeds old by > 10 points.
+
+## Process
+
+1. Check if `metadata_base.json` exists in the metadata directory
+2. Run enrichment: `python tools/enrich_metadata.py --input metadata/metadata_base.json --output metadata/`
+3. Review `metadata/review_queue.json` — present each flagged item to the user
+4. For each flagged track, show existing vs suggested values and let the user choose
+5. Apply corrections to `metadata/metadata_enriched.json`
+6. If corrections were made, offer to re-run with `--resume` to fill remaining gaps
+
+## On Blockers
+
+If the MusicBrainz API is unreachable, the script falls back to offline mode (copies base metadata as-is with `status: skipped`). Report this and suggest retrying later.
+
+## Constraints
+
+- **Respect rate limits**: Never bypass the 1 req/sec MusicBrainz limit
+- **Don't auto-apply review items**: Always present flagged tracks to the user for decision
+- **Keep originals**: Never delete or overwrite `metadata_base.json`
diff --git a/.claude/skills/enrich.md b/.claude/skills/enrich.md
@@ -0,0 +1,140 @@
+# Enrich — Metadata Enrichment Pipeline
+
+## When to Use
+
+- After adding new music to the library
+- When tracks have incomplete or incorrect metadata
+- To fetch album art for tracks missing artwork
+- Periodically to re-enrich with improved matching
+
+## Pipeline
+
+```
+./tools/pipeline.sh [/path/to/new/music]
+```
+
+That single command handles everything:
+
+```
+[extract] → [upload] → [enrich] → [publish]
+```
+
+| Step | What it does | When it runs |
+|------|-------------|--------------|
+| Extract | Scans audio files for ID3/Vorbis tags | Only with a path argument |
+| Upload | Uploads new audio to S3 | Only with a path argument |
+| Enrich | Queries MusicBrainz + Cover Art Archive | Always |
+| Publish | Uploads artwork to S3, pushes manifest | Always (unless `--skip-publish`) |
+
+## Common Usage
+
+```bash
+# Re-enrich entire library (idempotent — skips already-processed tracks)
+./tools/pipeline.sh
+
+# Add new music and enrich everything
+./tools/pipeline.sh /path/to/new/tracks
+
+# Preview what enrichment would do (writes dry_run_report.json)
+./tools/pipeline.sh --dry-run
+
+# Apply a previous dry run (reads cached results, no re-querying)
+./tools/pipeline.sh
+
+# Re-process everything from scratch
+./tools/pipeline.sh --no-resume
+
+# Limit to first N tracks (useful for testing)
+./tools/pipeline.sh --limit 10
+```
+
+## Options
+
+| Flag | Effect |
+|------|--------|
+| `--dry-run` | Preview matches, write `dry_run_report.json`, don't modify anything |
+| `--skip-publish` | Enrich locally but don't push to S3 |
+| `--skip-upload` | Skip uploading new audio files |
+| `--skip-artwork` | Skip album art fetching |
+| `--no-resume` | Re-process all tracks from scratch |
+| `--limit N` | Only process first N tracks |
+
+## How It Works
+
+### Matching
+1. Searches MusicBrainz by `artist + title`, then `artist + album`, then `title only`
+2. Scores candidates (0.0–1.0) using weighted field similarity
+3. Thresholds: **>= 0.85** auto-accept, **0.50–0.85** flag for review, **< 0.50** skip
+
+### Dry-Run → Real Run
+- `--dry-run` saves all match results to `dry_run_report.json`
+- A subsequent real run loads cached results — zero API re-queries
+- After applying, the report is deleted
+
+### Resume
+- `.enrichment_state.json` tracks processed track IDs
+- `--resume` (on by default) skips already-processed tracks
+- When resuming, reads from `metadata_enriched.json` to preserve prior work
+
+## Output Files
+
+| File | Purpose |
+|------|---------|
+| `metadata/metadata_enriched.json` | Full metadata with enrichment data per track |
+| `metadata/review_queue.json` | Tracks needing human review |
+| `metadata/dry_run_report.json` | Dry-run results (consumed by next real run) |
+| `metadata/.enrichment_state.json` | Resume checkpoint |
+| `metadata/manifest_enriched.json` | Clean manifest built during publish |
+| `metadata/artwork/*_enriched.jpg` | Downloaded album art |
+
+## Review Queue
+
+Tracks are flagged for review when:
+- Match confidence is between 0.50 and 0.85
+- Multiple sources disagree with existing tags (`likely_correction`)
+- Multiple high-confidence candidates disagree with each other
+- Album art upgrade available when existing art is present
+- Track has neither artist nor title
+
+Use the **archivist agent** to walk through flagged tracks interactively.
+
+## Conflict Classifications
+
+| Classification | Meaning | Action |
+|---------------|---------|--------|
+| `confirmed` | External data matches existing | No change |
+| `supplement` | Empty field filled | Auto-filled |
+| `likely_correction` | Multiple sources disagree with tag | Flagged |
+| `alternative` | One source offers different value | Noted, kept existing |
+
+## Individual Scripts
+
+For fine-grained control, run scripts directly:
+
+```bash
+# Enrich only
+python tools/enrich_metadata.py --input metadata/manifest.json --output metadata/ --resume
+
+# Publish only (after manual edits to metadata_enriched.json)
+python tools/publish_manifest.py --metadata-dir metadata/
+
+# Extract only
+python tools/extract_metadata.py /path/to/audio --output metadata/
+```
+
+## Rate Limits
+
+- MusicBrainz: 1 req/sec (enforced)
+- Cover Art Archive: 1 req/sec (enforced)
+- Full run: ~2-3 seconds per track
+- 118 tracks ≈ 4-6 minutes
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| "MusicBrainz API is unreachable" | Check internet; falls back to offline mode |
+| Many "no match" results | Tracks may have poor/missing metadata |
+| Interrupted mid-run | Just re-run — `--resume` is default |
+| Want to re-process one track | Remove its ID from `.enrichment_state.json` |
+| Artwork not showing in app | Check CloudFront invalidation completed |
diff --git a/.gitignore b/.gitignore
@@ -159,6 +159,14 @@ fffff.at-archive/
 # Artwork is in S3, not git
 metadata/artwork/
 
+# Enrichment pipeline output (regenerated by pipeline.sh)
+metadata/manifest.json
+metadata/metadata_enriched.json
+metadata/manifest_enriched.json
+metadata/review_queue.json
+metadata/.enrichment_state.json
+metadata/dry_run_report.json
+
 # Local dev manifest (copy from production for testing)
 www/manifest.json
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,19 @@
 # Changelog
 
+## [0.2.0] - 2026-02-15T20:41:35-05:00
+
+### Added
+- Metadata enrichment pipeline via MusicBrainz, Cover Art Archive, and iTunes Search API
+- Single entrypoint `pipeline.sh` for extract, upload, enrich, and publish steps
+- Confidence-based matching with auto-accept, review, and skip thresholds
+- Resume and dry-run support for idempotent re-runs
+- Publish step uploads artwork to S3 and pushes enriched manifest
+- Generative CSS gradient backgrounds for tracks without album artwork
+- Archivist agent and enrich skill for future curation workflows
+
+### Changed
+- `batch_upload.py` accepts enriched metadata format with `--enriched` flag
+
 ## [0.1.0] - 2026-02-14T01:12:36+00:00
 
 ### Added

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.1.0
+0.2.0
diff --git a/tools/batch_upload.py b/tools/batch_upload.py
@@ -20,6 +20,7 @@
 AWS_PROFILE = os.environ.get('AWS_PROFILE', 'default')
 TRACKS_BUCKET = os.environ.get('TRACKS_BUCKET', '')
 METADATA_FILE = 'metadata_base.json'
+ENRICHED_METADATA_FILE = 'metadata_enriched.json'
 MANIFEST_FILE = 'manifest.json'
 
 
@@ -59,11 +60,35 @@ def get_content_type(filepath: Path) -> str:
     return types.get(ext, 'application/octet-stream')
 
 
-def load_metadata(metadata_dir: Path) -> dict:
-    """Load metadata_base.json."""
+def load_metadata(metadata_dir: Path, enriched: bool = False) -> dict:
+    """Load metadata JSON. Prefers enriched if --enriched flag is set.
+
+    Normalizes tracks to dict format regardless of input shape (list or dict).
+    """
+    if enriched:
+        enriched_file = metadata_dir / ENRICHED_METADATA_FILE
+        if enriched_file.exists():
+            print(f"Using enriched metadata: {enriched_file}")
+            with open(enriched_file) as f:
+                data = json.load(f)
+            return _normalize_tracks(data)
+        print(f"Enriched metadata not found, falling back to base")
     metadata_file = metadata_dir / METADATA_FILE
     with open(metadata_file) as f:
-        return json.load(f)
+        data = json.load(f)
+    return _normalize_tracks(data)
+
+
+def _normalize_tracks(data: dict) -> dict:
+    """Ensure tracks is a dict keyed by path/id (handles manifest list format)."""
+    tracks = data.get('tracks', {})
+    if isinstance(tracks, list):
+        tracks_dict = {}
+        for track in tracks:
+            key = track.get('path') or track.get('original_path') or track['id']
+            tracks_dict[key] = track
+        data['tracks'] = tracks_dict
+    return data
 
 
 def save_metadata(metadata_dir: Path, metadata: dict):
@@ -145,12 +170,17 @@ def main():
         action='store_true',
         help='Skip uploading artwork files'
     )
+    parser.add_argument(
+        '--enriched',
+        action='store_true',
+        help='Use metadata_enriched.json instead of metadata_base.json'
+    )
 
     args = parser.parse_args()
 
     # Load metadata
     print(f"Loading metadata from {args.metadata_dir}...")
-    metadata = load_metadata(args.metadata_dir)
+    metadata = load_metadata(args.metadata_dir, enriched=args.enriched)
 
     total_tracks = len(metadata['tracks'])
     print(f"Found {total_tracks} tracks in metadata")