Skip to content

Define push notification protocol so repositories can proactively notify aggregators of new/updated packages #75

@toderash

Description

@toderash

Currently, aggregators discover packages by crawling repositories, which follows the current AspireCloud model and per the protocol design as documented in Draft PR #3.

This presents (at least?) three issues:

  1. Crawl latency: new packages aren't discoverable until the next crawl cycle.
  2. Crawler infrastructure: every aggregator must maintain its own crawl scheduler and handle crawl failures, rate limiting, and backpressure.
  3. Ecosystem flexibility: the design assumes all aggregators are gated, and that repositories don't know where they're aggregated.

Proposed: repository-to-aggregator push notification

A simple push protocol would let repositories notify registered aggregators immediately when a package or release is created or updated. The aggregator then fetches the updated record on demand rather than on a schedule. This significantly reduces needless polling of infrequently-updated package repos.

Proposed endpoint on aggregators (POST /intake):

{
  "type": "package.updated",
  "did": "did:plc:ia6vk5krwkcka2nwuzs6l6lq",
  "repository": "https://example.fair.pm/packages/1234",
  "timestamp": "2026-04-26T12:00:00Z"
}

The notification is a hint only — it contains no authoritative data, just the DID and repository URL. The aggregator fetches the actual metadata from the repository to verify and index it. A malicious notification can't inject bad data; it can only cause the aggregator to fetch from the repository, which it would have done on the next crawl anyway. To ensure this, the repository URI MUST match the previous one, and the aggregator MUST contact the canonical repo URL.

Aggregator registration:

Repositories need to know which aggregators to notify. We may need more than one option to address different ecosystems. These include:

  1. Repositories go through an application process for Aggregators to be listed (current assumption).
  2. Repositories notify Aggregators via public API, requesting indexing, which can happen automatically based on set criteria.
  3. Aggregators register a webhook URL with repositories they want notifications from.
  4. Repositories notify all aggregators in the FAIR aggregator directory - this requires that a directory spec exists. Rather than a central registry, this may be designed as a federated protocol like ActivityPub or perhaps closer to our interests, ATproto (perhaps a better fit here).

Event types to define now:

  • package.created — new package published
  • package.updated — package metadata updated
  • release.created — new release published
  • package.deleted — package removed (tombstone)
  • release.deleted — release removed

Relation to issue #48: This issue extends the scope of #48 from "aggregator-initiated discovery" to "repository-initiated notification." Both mechanisms need to exist. In addition to adding robustness, crawl support ensures that private aggregators can still reach the repo, even if they're behind a firewall that blocks the aggregator's API to external traffic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionNeeds discussion beforehandenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions