Skip to content

Replace index-based discovery with ScraperRegistry (#554)#572

Merged
danieldotnl merged 2 commits intomasterfrom
feature/554-scraper-registry
Mar 14, 2026
Merged

Replace index-based discovery with ScraperRegistry (#554)#572
danieldotnl merged 2 commits intomasterfrom
feature/554-scraper-registry

Conversation

@danieldotnl
Copy link
Copy Markdown
Owner

@danieldotnl danieldotnl commented Mar 14, 2026

Summary

  • Replace fragile array-index lookups (SCRAPER_IDX/PLATFORM_IDX) with a type-safe ScraperRegistry using deterministic string IDs derived from config names
  • Scraper/entity lookup is now reload-safe since order no longer matters
  • Duplicate scraper names and entity names are automatically deduplicated with _2, _3 suffixes and a warning log

Changes

  • New: registry.pyScraperInstance dataclass and ScraperRegistry class
  • Modified: __init__.py — uses registry instead of dict-of-lists; discovery passes string IDs
  • Modified: const.py — removed SCRAPER_IDX/PLATFORM_IDX/SCRAPER_DATA, added SCRAPER_ID/ENTITY_KEY
  • New: test_registry.py — unit tests for registry operations
  • Modified: test_init.py — updated for registry-based lookups, added duplicate-name tests
  • No changes to sensor.py, binary_sensor.py, button.py (they pass discovery_info through without inspecting it)

Test plan

  • All 317 tests pass (including 3 new deduplication tests)
  • Manual test with HA: configure two scrapers, reload, verify entities map to correct scrapers
  • Manual test: configure duplicate scraper names, verify warning logged and both work

Closes #554

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved auto-generated naming for scrapers without explicit names, now using slugified resources or numeric fallbacks instead of generic labels.
  • Improvements

    • Enhanced deduplication of scraper configurations for improved stability and uniqueness handling.

danieldotnl and others added 2 commits March 14, 2026 07:56
Replace fragile array-index lookups (SCRAPER_IDX/PLATFORM_IDX) with a
type-safe ScraperRegistry using deterministic string IDs derived from
config names. This makes scraper/entity lookup reload-safe since order
no longer matters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add _deduplicate_id() for scraper names that collide (appends _2, _3, etc.)
- Add _deduplicate_entity_key() for entity names that collide within a scraper
- Use enumerate index for unnamed scrapers without a resource URL
- Add proper TYPE_CHECKING type hints for ScraperInstance fields
- Add contains() method to ScraperRegistry
- Add tests for duplicate scraper names and duplicate entity names

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 14, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request replaces index-based scraper discovery with a type-safe ScraperRegistry system. A new ScraperInstance dataclass wraps scrapers with their coordinators, while the registry manages instances by unique IDs. Constants are updated to use SCRAPER_ID and ENTITY_KEY instead of SCRAPER_IDX and PLATFORM_IDX. Discovery now passes IDs rather than indices via load_platform, and config/coordinator retrieval uses registry lookups instead of nested dictionary access.

Changes

Cohort / File(s) Summary
Registry Core
custom_components/multiscrape/registry.py
New module with ScraperInstance dataclass and ScraperRegistry class. Registry provides methods to register, retrieve, and manage scraper instances with deduplication and per-platform configuration tracking.
Constants & Public API
custom_components/multiscrape/const.py
Removed legacy constants (SCRAPER_IDX, PLATFORM_IDX, COORDINATOR, SCRAPER, SCRAPER_DATA). Added new constants SCRAPER_ID and ENTITY_KEY for discovery and registry lookups.
Core Integration
custom_components/multiscrape/__init__.py
Integrated ScraperRegistry into domain data storage. Added _deduplicate_id() and _deduplicate_entity_key() helpers. Updated setup to register ScraperInstance objects and generate names for unnamed scrapers. Modified async_get_config_and_coordinator() to retrieve from registry using discovery keys instead of indices.
Test Updates
tests/test_init.py
Updated test setup and assertions to use ScraperRegistry API (get_all(), get(id)) instead of direct dictionary access. Changed discovery/config flow tests to use SCRAPER_ID and ENTITY_KEY keys. Extended tests to verify deduplication and per-scraper isolation.
Registry Tests
tests/test_registry.py
New test module with comprehensive unit tests for ScraperRegistry and ScraperInstance, covering registration, retrieval, deduplication, and error handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hoppy times ahead!
No more index confusion, the registry's clear,
Each scraper has a name, its identity near,
With IDs and keys, they won't disappear,
Reload-safe discovery brings us good cheer! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Replace index-based discovery with ScraperRegistry (#554)' accurately summarizes the main change—migrating from array-index lookups to a type-safe registry system.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #554: ScraperRegistry class with register/get/get_all methods, ScraperInstance dataclass, deterministic string IDs, deduplication logic, and reload-safe lookups replacing fragile index-based discovery.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the registry migration objective: registry.py adds the core infrastructure, init.py and const.py refactor to use it, tests validate the new behavior, and platform modules remain unchanged.
Docstring Coverage ✅ Passed Docstring coverage is 97.14% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/554-scraper-registry
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@danieldotnl danieldotnl merged commit 60ea438 into master Mar 14, 2026
6 of 7 checks passed
@danieldotnl danieldotnl deleted the feature/554-scraper-registry branch March 14, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace index-based discovery with ScraperRegistry

1 participant