JSON extractor: application/json falls back to HTML (#168 follow-up)

## Background

#168 (slice 2) made `content_media_type` drive extractor dispatch via `ServiceRegistry.get_extractor` (`src/core/registry.py`). Dispatch is **total**: any essence not explicitly mapped falls back to the HTML extractor.

`application/json` has **no dedicated extractor**, so JSON targets are currently run through `HtmlExtractor`. Archiver's declared content-kind family already includes `json` (`jsonpath → json`), but Watcher has no JSON extractor to match.

## Ask

Add a JSON extractor (e.g. canonicalize/pretty-print + structural chunking, or JSONPath-aware extraction aligned with Archiver's `jsonpath` algorithm) and register it:

- `src/core/extractors/` — new `JsonExtractor` (mirror to `/home/exedev/archiver/src/core/extractors/` per the mirrored-content-acquisition policy).
- `src/core/registry.py` — map `application/json` (and likely `application/*+json`) → `JsonExtractor`.
- Tests: routing (`application/json` → JsonExtractor) + extraction behavior.

## Notes

- HTML fallback is non-crashing today, so this is an enhancement, not a regression.
- Consider `application/*+json` (vendor JSON) essence handling at the same time.
- Related: magic-byte/extension hardening and drift detection were also deferred from #168.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON extractor: application/json falls back to HTML (#168 follow-up) #212

Background

Ask

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

JSON extractor: application/json falls back to HTML (#168 follow-up) #212

Description

Background

Ask

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions