Replace index-based discovery with ScraperRegistry (#554) by danieldotnl · Pull Request #572 · danieldotnl/ha-multiscrape

danieldotnl · 2026-03-14T08:02:56Z

Summary

Replace fragile array-index lookups (SCRAPER_IDX/PLATFORM_IDX) with a type-safe ScraperRegistry using deterministic string IDs derived from config names
Scraper/entity lookup is now reload-safe since order no longer matters
Duplicate scraper names and entity names are automatically deduplicated with _2, _3 suffixes and a warning log

Changes

New: registry.py — ScraperInstance dataclass and ScraperRegistry class
Modified: __init__.py — uses registry instead of dict-of-lists; discovery passes string IDs
Modified: const.py — removed SCRAPER_IDX/PLATFORM_IDX/SCRAPER_DATA, added SCRAPER_ID/ENTITY_KEY
New: test_registry.py — unit tests for registry operations
Modified: test_init.py — updated for registry-based lookups, added duplicate-name tests
No changes to sensor.py, binary_sensor.py, button.py (they pass discovery_info through without inspecting it)

Test plan

All 317 tests pass (including 3 new deduplication tests)
Manual test with HA: configure two scrapers, reload, verify entities map to correct scrapers
Manual test: configure duplicate scraper names, verify warning logged and both work

Closes #554

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved auto-generated naming for scrapers without explicit names, now using slugified resources or numeric fallbacks instead of generic labels.
Improvements
- Enhanced deduplication of scraper configurations for improved stability and uniqueness handling.

Replace fragile array-index lookups (SCRAPER_IDX/PLATFORM_IDX) with a type-safe ScraperRegistry using deterministic string IDs derived from config names. This makes scraper/entity lookup reload-safe since order no longer matters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add _deduplicate_id() for scraper names that collide (appends _2, _3, etc.) - Add _deduplicate_entity_key() for entity names that collide within a scraper - Use enumerate index for unnamed scrapers without a resource URL - Add proper TYPE_CHECKING type hints for ScraperInstance fields - Add contains() method to ScraperRegistry - Add tests for duplicate scraper names and duplicate entity names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-14T08:03:14Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request replaces index-based scraper discovery with a type-safe ScraperRegistry system. A new ScraperInstance dataclass wraps scrapers with their coordinators, while the registry manages instances by unique IDs. Constants are updated to use SCRAPER_ID and ENTITY_KEY instead of SCRAPER_IDX and PLATFORM_IDX. Discovery now passes IDs rather than indices via load_platform, and config/coordinator retrieval uses registry lookups instead of nested dictionary access.

Changes

Cohort / File(s)	Summary
Registry Core `custom_components/multiscrape/registry.py`	New module with `ScraperInstance` dataclass and `ScraperRegistry` class. Registry provides methods to register, retrieve, and manage scraper instances with deduplication and per-platform configuration tracking.
Constants & Public API `custom_components/multiscrape/const.py`	Removed legacy constants (`SCRAPER_IDX`, `PLATFORM_IDX`, `COORDINATOR`, `SCRAPER`, `SCRAPER_DATA`). Added new constants `SCRAPER_ID` and `ENTITY_KEY` for discovery and registry lookups.
Core Integration `custom_components/multiscrape/__init__.py`	Integrated `ScraperRegistry` into domain data storage. Added `_deduplicate_id()` and `_deduplicate_entity_key()` helpers. Updated setup to register `ScraperInstance` objects and generate names for unnamed scrapers. Modified `async_get_config_and_coordinator()` to retrieve from registry using discovery keys instead of indices.
Test Updates `tests/test_init.py`	Updated test setup and assertions to use `ScraperRegistry` API (`get_all()`, `get(id)`) instead of direct dictionary access. Changed discovery/config flow tests to use `SCRAPER_ID` and `ENTITY_KEY` keys. Extended tests to verify deduplication and per-scraper isolation.
Registry Tests `tests/test_registry.py`	New test module with comprehensive unit tests for `ScraperRegistry` and `ScraperInstance`, covering registration, retrieval, deduplication, and error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppy times ahead! ✨
No more index confusion, the registry's clear,
Each scraper has a name, its identity near,
With IDs and keys, they won't disappear,
Reload-safe discovery brings us good cheer! 🌟

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Replace index-based discovery with ScraperRegistry (`#554`)' accurately summarizes the main change—migrating from array-index lookups to a type-safe registry system.
Linked Issues check	✅ Passed	The PR successfully implements all coding requirements from issue `#554`: ScraperRegistry class with register/get/get_all methods, ScraperInstance dataclass, deterministic string IDs, deduplication logic, and reload-safe lookups replacing fragile index-based discovery.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the registry migration objective: registry.py adds the core infrastructure, init.py and const.py refactor to use it, tests validate the new behavior, and platform modules remain unchanged.
Docstring Coverage	✅ Passed	Docstring coverage is 97.14% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/554-scraper-registry

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

danieldotnl and others added 2 commits March 14, 2026 07:56

danieldotnl merged commit 60ea438 into master Mar 14, 2026
6 of 7 checks passed

danieldotnl deleted the feature/554-scraper-registry branch March 14, 2026 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace index-based discovery with ScraperRegistry (#554)#572

Replace index-based discovery with ScraperRegistry (#554)#572
danieldotnl merged 2 commits intomasterfrom
feature/554-scraper-registry

danieldotnl commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieldotnl commented Mar 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danieldotnl commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading