Problem
AbstractCollector in core/collectors/base_collector.py (line 99) provides a structured collector contract with three abstract methods — name (property), validate_config(), and collect() — and a concrete run() method that calls validate_config() then collect(). Shared lifecycle behavior is provided by _CollectorLifecycleMixin (line 43): handle_error(exc) logs failures with classify_failure() output, and sync_pinecone() is a no-op override point. Thirteen collectors already inherit from AbstractCollector (verified: BoostGithubActivityCollector, DiscordActivityCollector, CppaSlackTrackerCollector, Wg21PaperTrackerCollector, BoostMailingListTrackerCollector, and 8 others). However, real collectors need multi-phase orchestration beyond the current validate_config → collect sequence — pre-fetch validation, incremental state checks, collection, post-processing, and reporting — and without these hooks, each collector re-implements lifecycle patterns that should propagate through the framework. GrimoireLab's Perceval Backend base class (standardized fetch() and fetch_items() methods) and Apache DevLake's Go Plugin interface with Init(), Execute(), and RootPkgPath() show the industry pattern.
Acceptance Criteria
Implementation Notes
AbstractCollector is at core/collectors/base_collector.py:99 — current methods: name (abstract property), validate_config() (abstract), collect() (abstract), run() (concrete)
_CollectorLifecycleMixin at line 43 provides handle_error() and sync_pinecone() — new hooks can follow this mixin pattern
BaseCollectorCommand in core/collectors/command_base.py invokes run() then sync_pinecone() — update if run() signature changes
- Hooks should be optional (not abstract) to avoid breaking existing subclasses
- Existing tests in
core/tests/test_collectors_base.py already cover lifecycle ordering — extend for new hooks
References
- Horizon item from Week 22 plan
- Related files:
core/collectors/base_collector.py, core/collectors/command_base.py, core/tests/test_collectors_base.py
Problem
AbstractCollectorincore/collectors/base_collector.py(line 99) provides a structured collector contract with three abstract methods —name(property),validate_config(), andcollect()— and a concreterun()method that callsvalidate_config()thencollect(). Shared lifecycle behavior is provided by_CollectorLifecycleMixin(line 43):handle_error(exc)logs failures withclassify_failure()output, andsync_pinecone()is a no-op override point. Thirteen collectors already inherit fromAbstractCollector(verified:BoostGithubActivityCollector,DiscordActivityCollector,CppaSlackTrackerCollector,Wg21PaperTrackerCollector,BoostMailingListTrackerCollector, and 8 others). However, real collectors need multi-phase orchestration beyond the currentvalidate_config → collectsequence — pre-fetch validation, incremental state checks, collection, post-processing, and reporting — and without these hooks, each collector re-implements lifecycle patterns that should propagate through the framework. GrimoireLab's PercevalBackendbase class (standardizedfetch()andfetch_items()methods) and Apache DevLake's GoPlugininterface withInit(),Execute(), andRootPkgPath()show the industry pattern.Acceptance Criteria
AbstractCollectorextended with optional lifecycle hooks:pre_collect(),post_collect(),on_error()run()method updated to callpre_collect()→validate_config()→collect()→post_collect()core/tests/test_collectors_base.py)Implementation Notes
AbstractCollectoris atcore/collectors/base_collector.py:99— current methods:name(abstract property),validate_config()(abstract),collect()(abstract),run()(concrete)_CollectorLifecycleMixinat line 43 provideshandle_error()andsync_pinecone()— new hooks can follow this mixin patternBaseCollectorCommandincore/collectors/command_base.pyinvokesrun()thensync_pinecone()— update ifrun()signature changescore/tests/test_collectors_base.pyalready cover lifecycle ordering — extend for new hooksReferences
core/collectors/base_collector.py,core/collectors/command_base.py,core/tests/test_collectors_base.py