Skip to content

WIP: Startlark enricher - for discussion#84

Open
spastai wants to merge 2 commits intomainfrom
PH-83/starlark
Open

WIP: Startlark enricher - for discussion#84
spastai wants to merge 2 commits intomainfrom
PH-83/starlark

Conversation

@spastai
Copy link
Contributor

@spastai spastai commented Jul 9, 2025

I propose to integrate starlark as mapper: NewStarlark takes filename of .star file and creates a
func(item map[string]interface{}) map[string]interface{}

This allows pharos users to define their scripts how to map results and they appear on grafana board.

  1. Given result contains namespace it is possible to create static file mapping namespace to owner.
  2. This could be improved if namespace labels are read from prometheus - then there could be specific data about inventory id in that label.
  3. Going father there is possible to create enricher which gets the owners out of inventory id

#83

@frwa-enpace
Copy link
Contributor

I was using: https://github.com/1set/starlet - I could be wrong, but it made life a lot simpler.

Simple Poc is here: https://github.com/metraction/scanner-poc/blob/main/integrations/riskanalyzer.go#L53

@techzoom
Copy link
Contributor

Starlark (or any enrichment plugin)

I aggree with the general pattern: func(item map[string]interface{}) map[string]interface{}

  • the StarLark go package https://github.com/google/starlark-go implements the interpreter as a pure function
  • this, it cannot access system time or local config files or other meta information.

Suggestion

func(item map[string]interface{}, meta map[string]interface{}) map[string]interface{}

  • with meta we can feed read only context like system time, content of a config file
  • the plugin engine would populate the meta before calling the pluging

@spastai
Copy link
Contributor Author

spastai commented Jul 10, 2025

Currently each mapper gets

<additional context 1>
data: 
  <PharosScanResult as map of maps>

We agreed that each mapper will get payload and meta . As current structure take 1 map of maps I suggest it will get following format:

meta:
   mapper1: <mapper 1 result>
   ...
   mapper3: <mapper N result>
payload:
   <PharosScanResult>

Then meta is written to the PharosScanResult.ScanTask.Context

We need to agree should we allow to modify PharosScanResult.
I favour immutability and current implementation supports it - contexts are shown on dashboard. The pros are:

  1. Easy to debug - you see all contexts and can trace the changes (people should be confident how results are derived)
  2. Flexibility - allows to modify context without braking i.e grafana dashboard

@techzoom
Copy link
Contributor

Plugin Function Signature

payload, meta := func(payload, meta)

Payload

  • this is the PharosScanResult
  • this is what we enrich and what will go into the database after the enrichment pipeline run all plugins

Meta

  • this is metadata (or globals) for the enricher plugins
  • it contains specific Pharos API context, like date, time, IPs, ....
  • Plugins can add data to meta, which is then available for all following plugins (e.g. one plugin just loads the "eos-map.yaml", which then would be available to all following plugins by just reading it from meta
  • This allows us to separate function from config data, and easy reuse of config data (that could also be read form external APIs)
  • meta is only used for enricher plugins, will not be written into payload and be discarded after all plugins ran

@techzoom
Copy link
Contributor

*Payload modifications
I advocate that a plugin can do whatever it wants with the payload, even to the point to break the systems

Reason

  • We need to set the "Severity" and "DueDate" in the findings upon data ingest to ensure all future data anlytics and read operation have consistent state
  • We want to give the plugin writer to implement use cases we have not thought of

What it is not a problem

  • Plugin writers are smart and want functions, not break their system
  • Artificially put logic into the system and constrain it does not add benefits nor capabilities to pharos
  • Splunk Plugins work exactly this way and are very successful

@spastai spastai self-assigned this Jul 10, 2025
@spastai
Copy link
Contributor Author

spastai commented Jul 10, 2025

  • To my understanding "Severity" and "DueDate" should be set in context and used in data ingestion. In candela I was always confused how this due date was calculated. Having transparency is a strong plus. Data analytics will be setup to read from that context and will be stable.
  • Plugin author is not limited to write what ever he thinks - he takes an input and produces additional output.
    Immutable option has the same features as mutable, but few strong extra points: traceability and composability (each plugin is guaranteed to get payload).

Hard to argue with "splunk plugin works exactly this way" as I don't know internals and not sure splunk case is applicable for us. https://community.splunk.com/t5/Splunk-Enterprise/Is-there-any-possible-to-modify-raw-data-in-Splunk/m-p/553396 states that ingested data is immutable, so this opposes "exactly this way".

@frwa-enpace
Copy link
Contributor

Starlark (or any enrichment plugin)

I aggree with the general pattern: func(item map[string]interface{}) map[string]interface{}

  • the StarLark go package https://github.com/google/starlark-go implements the interpreter as a pure function
  • this, it cannot access system time or local config files or other meta information.

Suggestion

func(item map[string]interface{}, meta map[string]interface{}) map[string]interface{}

  • with meta we can feed read only context like system time, content of a config file
  • the plugin engine would populate the meta before calling the pluging

With the starlet pkg you can add functions to it. No need.

@frwa-enpace
Copy link
Contributor

  • To my understanding "Severity" and "DueDate" should be set in context and used in data ingestion. In candela I was always confused how this due date was calculated. Having transparency is a strong plus. Data analytics will be setup to read from that context and will be stable.
  • Plugin author is not limited to write what ever he thinks - he takes an input and produces additional output.
    Immutable option has the same features as mutable, but few strong extra points: traceability and composability (each plugin is guaranteed to get payload).

Hard to argue with "splunk plugin works exactly this way" as I don't know internals and not sure splunk case is applicable for us. https://community.splunk.com/t5/Splunk-Enterprise/Is-there-any-possible-to-modify-raw-data-in-Splunk/m-p/553396 states that ingested data is immutable, so this opposes "exactly this way".

The input must be immutable, it is linked to other objects, if you change the vuln data for one image, it will be changed globally unless you create a copy.

@spastai spastai changed the title Startlark enricher - for discussion WIP: Startlark enricher - for discussion Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants