Skip to content

Multi-phase orchestration hooks for AbstractCollector #256

@snowfox1003

Description

@snowfox1003

Problem

AbstractCollector in core/collectors/base_collector.py (line 99) provides a structured collector contract with three abstract methods — name (property), validate_config(), and collect() — and a concrete run() method that calls validate_config() then collect(). Shared lifecycle behavior is provided by _CollectorLifecycleMixin (line 43): handle_error(exc) logs failures with classify_failure() output, and sync_pinecone() is a no-op override point. Thirteen collectors already inherit from AbstractCollector (verified: BoostGithubActivityCollector, DiscordActivityCollector, CppaSlackTrackerCollector, Wg21PaperTrackerCollector, BoostMailingListTrackerCollector, and 8 others). However, real collectors need multi-phase orchestration beyond the current validate_config → collect sequence — pre-fetch validation, incremental state checks, collection, post-processing, and reporting — and without these hooks, each collector re-implements lifecycle patterns that should propagate through the framework. GrimoireLab's Perceval Backend base class (standardized fetch() and fetch_items() methods) and Apache DevLake's Go Plugin interface with Init(), Execute(), and RootPkgPath() show the industry pattern.

Acceptance Criteria

  • AbstractCollector extended with optional lifecycle hooks: pre_collect(), post_collect(), on_error()
  • Default implementations are no-ops (backward compatible with all 13 existing subclasses)
  • run() method updated to call pre_collect()validate_config()collect()post_collect()
  • At least one existing collector migrated to use the hooks
  • Tests covering hook invocation order and error propagation (extend core/tests/test_collectors_base.py)

Implementation Notes

  • AbstractCollector is at core/collectors/base_collector.py:99 — current methods: name (abstract property), validate_config() (abstract), collect() (abstract), run() (concrete)
  • _CollectorLifecycleMixin at line 43 provides handle_error() and sync_pinecone() — new hooks can follow this mixin pattern
  • BaseCollectorCommand in core/collectors/command_base.py invokes run() then sync_pinecone() — update if run() signature changes
  • Hooks should be optional (not abstract) to avoid breaking existing subclasses
  • Existing tests in core/tests/test_collectors_base.py already cover lifecycle ordering — extend for new hooks

References

  • Horizon item from Week 22 plan
  • Related files: core/collectors/base_collector.py, core/collectors/command_base.py, core/tests/test_collectors_base.py

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions