Skip to content

fix(endpoint-exposer): make install usable and spec tenant-agnostic#11

Merged
jcastiarena merged 7 commits intomainfrom
feat/endpoint-exposer-fixes
Apr 21, 2026
Merged

fix(endpoint-exposer): make install usable and spec tenant-agnostic#11
jcastiarena merged 7 commits intomainfrom
feat/endpoint-exposer-fixes

Conversation

@jcastiarena
Copy link
Copy Markdown
Contributor

Summary

Three related fixes so any tenant can register and use endpoint-exposer by following install/installation.md verbatim. Today the install flow is broken in two independent ways and the spec locks every tenant to an example FQDN. Details below.

Current broken state

1. install/tofu/main.tf doesn't compile against current tofu-modules

The module call uses input names that were removed from nullplatform/tofu-modules in v1.52.x:

Old (used by this PR's parent commit) Current API
git_repo repository_org + repository_name
git_ref repository_branch
git_service_path service_path
use_tpl_files (removed — the module no longer renders templates)
git_password repository_token

Reproduction (exactly what installation.md tells tenants to do):

git clone https://github.com/nullplatform/services /root/.np/nullplatform/services
git clone https://github.com/nullplatform/tofu-modules /root/.np/nullplatform/tofu-modules
cd /root/.np/nullplatform/services/endpoint-exposer/install/tofu
cp terraform.tfvars.example terraform.tfvars  # fill in values
tofu init -backend-config=...
tofu plan

Result:

Error: Unsupported argument
  on main.tf line 13, in module "service_definition":
  13:   git_repo = var.git_repo
An argument named "git_repo" is not expected here.

…repeated for git_ref, git_service_path, use_tpl_files, git_password. Five errors, zero successful plans.

Additionally, install/tofu/main.tf passed agent_command and workflow_override_path to service_definition_agent_association — neither is accepted by that module today. The module constructs the cmdline automatically as ${base_clone_path}/${repository_service_spec_repo}/${service_path}/entrypoint/entrypoint and exposes agent_arguments for flags.

2. Hardcoded domain enums in the spec lock every tenant to an example FQDN

install/specs/service-spec.json.tpl had:

"publicDomain": {
  "enum": ["hello.idp.poc.nullapps.io"],
  "type": "string",
  ...
}

…repeated for privateDomain and across the duplicated schemas in each action specification (10 enum blocks total). This FQDN is a leftover example from another tenant's POC. With the enum present, the nullplatform UI offered it as the only selectable value — any other tenant (us for galicia, or anyone else) cannot create a scope through the UI because their real FQDN isn't in the enum.

The reorganization of endpoint-exposer under install/ (#3) carried this over unchanged from the previous branch.

3. {{ env.Getenv }} in the spec is dead code

The spec ships with:

"name": "{{ env.Getenv \"SERVICE_NAME\" | default \"Endpoint Exposer\" }}",
"visible_to": ["{{ env.Getenv \"NRN\" }}"],

suggesting those two fields are parametrizable at tofu apply via env vars. They are not. The current service_definition module reads the spec with data "http" + jsondecode() — no template engine anywhere in the pipeline. These fields "work" only because the module overrides them explicitly from TF variables:

resource "nullplatform_service_specification" "from_template" {
  name       = var.service_name                        # not from spec
  visible_to = concat([var.nrn], var.extra_visibile_to_nrns)  # not from spec
  ...
}

Any future author who adds a {{ env.Getenv "MY_FIELD" }} to a field that is not overridden at the TF layer will ship a spec with the literal template string as its value — silently, until a developer opens the scope creation form and sees {{ env.Getenv "MY_FIELD" }} in the UI.

Changes

Spec (install/specs/service-spec.json.tpl)

  • Remove the 10 hardcoded domain enums (5 × publicDomain, 5 × privateDomain across the main schema + action specifications).
  • Add "description" on both domain fields so developers know what kind of value to provide. Fields are now "type": "string" free-text; the FQDN is typed at scope creation time.
  • Replace the {{ env.Getenv "SERVICE_NAME" ... }} template with the literal "Endpoint Exposer" (the TF layer overrides it regardless).
  • Replace "visible_to": ["{{ env.Getenv \"NRN\" }}"] with "visible_to": [] (same reasoning; the TF layer sets it from var.nrn).

Install (install/tofu/)

  • Migrate main.tf to the current nullplatform/tofu-modules API (repository_org/_name/_branch/service_path/repository_token).
  • Drop unsupported args (agent_command, workflow_override_path, service_description) from the agent association module call. Forward --overrides-path=<path> via agent_arguments when overrides are enabled.
  • Rename variables to match: split git_repo into repository_org + repository_name, rename git_branchrepository_branch and git_service_pathspec_path. Add agent_service_path so the specs path (includes install/ for this service) and runtime path (just endpoint-exposer) can differ.
  • Make github_token optional (default null) — nullplatform/services is public, token is only needed when pointing at a private fork.
  • Update terraform.tfvars.example to match.
  • Regenerate .terraform.lock.hclhashicorp/external and hashicorp/null are no longer transitive dependencies (the old module used them; the new one doesn't).

Docs

  • installation.md: new variables table, note on optional github_token, corrected cmdline example (was missing the trailing /entrypoint), new "Domains" section explaining the free-text contract.
  • prerequisites.md: updated path-override guidance to the new variable names; clarify github_token is optional for the default spec repo.

Test plan

  • tofu fmt -check clean
  • tofu init -backend=false && tofu validate succeeds with local clone of current nullplatform/tofu-modules main
  • jq -e . install/specs/service-spec.json.tpl parses (no unclosed template braces)
  • gomplate -f install/specs/service-spec.json.tpl | jq . produces byte-identical output (idempotent — no templates remain)
  • Full end-to-end: apply in a clean nullplatform account and create a scope through the UI. Will be validated on the Banco Galicia POC once this PR is merged (or via apply against the feature branch if we need to unblock sooner).

Notes for reviewers

  • The three fixes are independent but share the same root cause: the install flow was designed around a nullplatform/tofu-modules generation that is no longer current, and nobody has exercised the install guide end-to-end since the API changed.
  • Commits are split by concern (spec vs. install/docs) to make review easier. Either can be cherry-picked in isolation.
  • Removing the domain enum was discussed with @feature/endpoint-exposer authors; the longer-term direction of "reading hosts dynamically from a provider" stays an open design item — this PR just stops pinning the field to an example value in the meantime.

🤖 Generated with Claude Code

- Remove the hardcoded enum on publicDomain/privateDomain (was
  ["hello.idp.poc.nullapps.io"], an example FQDN from another tenant's
  POC). The enum bound every tenant to that FQDN through the UI
  dropdown; the field is now free-text so each tenant provides its own.
  A description is added on both fields so developers know what kind of
  value is expected.

- Remove the {{ env.Getenv "SERVICE_NAME" }} and {{ env.Getenv "NRN" }}
  templates. These are never rendered: the current
  nullplatform/tofu-modules service_definition module reads the spec
  via `data "http"` + `jsondecode()` (no gomplate) and resolves `name`
  and `visible_to` from TF variables (var.service_name,
  concat([var.nrn], ...)). So the templates "worked" only because other
  fields happened to be overridden at the TF layer. Leaving them in
  place invites future authors to add their own {{ env.Getenv }} for
  non-overridden fields, which would silently fail.

The runtime workflow is unaffected — the Istio manifests are assembled
from the scope instance's attributes at execution time, not from the
spec's example values.
install/tofu/main.tf referenced the old variable names of
nullplatform/tofu-modules' service_definition module:
  - git_repo, git_ref, git_service_path, use_tpl_files, git_password

Those were removed in tofu-modules v1.52.x. Running `tofu plan` against
the current main of tofu-modules (what installation.md tells tenants to
clone) fails with "Unsupported argument" on five lines, blocking any
tenant that follows the guide verbatim.

Alongside, install/tofu was passing `agent_command` and
`workflow_override_path` to service_definition_agent_association —
neither is accepted by the module. The module builds the cmdline
automatically from base_clone_path + repository_service_spec_repo +
service_path + "/entrypoint/entrypoint", and exposes `agent_arguments`
for passing flags to that entrypoint.

Changes:
- main.tf: use repository_org/name/branch/service_path +
  repository_token on service_definition; drop the unsupported
  agent_command and workflow_override_path on the association module;
  forward --overrides-path via agent_arguments.
- variables.tf: split git_repo into repository_org + repository_name;
  rename git_branch → repository_branch and git_service_path →
  spec_path; add agent_service_path so the specs path (which includes
  "install/" for this service) and the runtime path (just
  "endpoint-exposer") can differ. Make github_token optional
  (nullplatform/services is public; the token is only needed for
  private forks).
- terraform.tfvars.example: update to match.
- installation.md: update the variables table, note that github_token is
  optional for the default (public) spec repo, fix the cmdline sample
  (was missing the trailing /entrypoint), and add a short "Domains"
  section explaining the free-text contract introduced in the spec
  cleanup commit.
- prerequisites.md: update path-override guidance to the new variable
  names; clarify github_token is optional.
- .terraform.lock.hcl: regenerated — hashicorp/external and
  hashicorp/null are no longer pulled in (the old module used them;
  the new one doesn't).

Tested: `tofu fmt -check` clean; `tofu init -backend=false && tofu
validate` succeeds against nullplatform/tofu-modules main.
Discovered when applying the migrated install/tofu module against a
clean nullplatform account: `tofu plan` fails with

  Error: Error in function call
  ...Call to function "jsondecode" failed: extraneous data after JSON object.

nullplatform/tofu-modules' `service_definition` module defaults
`available_links = ["connect"]`, which makes it attempt to fetch
`<service_path>/specs/links/connect.json.tpl` via HTTP. Endpoint
Exposer is a `type = "dependency"` service that ships no link spec
(no `install/specs/links/` directory), so the fetch returns a 404
HTML page — `jsondecode()` then aborts the whole plan with the
generic "extraneous data" error, which is hard to map back to the
root cause on first encounter.

Passing an explicit empty list resolves it cleanly. The same
override is now required in any tenant's own Terraform that
registers this service without going through install/tofu — see the
galicia-banco POC for the downstream mirror.

A deeper fix belongs in tofu-modules' `service_definition`
(defaulting `available_links` to `[]` when the spec itself doesn't
declare any, or deriving it from `spec.available_links`); left out
of this PR to keep the scope focused on the endpoint-exposer
install flow.
…uired

Each route item's JSON Schema listed "environment" in `required` but did
NOT declare it under `properties` — only `method`, `path`, `scope`,
`visibility`, and `groups`. A route's JSON Schema that requires a field
it doesn't define is an instant validation failure for any input, so
the UI bounces service creation with:

  /routes/0 must have required property 'environment'

…even when the user has filled in every visible field. `environment` is
already a top-level property of the service (populated from the
associated scopes' dimensions); the per-route duplicate is a leftover,
not intended functionality.

Removes "environment" from `routes[].items.required` in all four
schemas that duplicate the routes definition (main service attributes +
the two action specifications, each with parameters/results schemas).

Discovered during the Galicia POC smoke test: registration succeeded
end-to-end, but service creation was blocked for every tenant until
this is fixed.
Two runtime bugs that made the service unusable out-of-the-box:

1) `INGRESS_TYPE` defaulted to `alb`, but the repo only ships
   `workflows/istio/`. Any tenant that didn't explicitly set the env
   var to `istio` saw:

     failed to read workflow file: open /workflows/alb/create.yaml:
     no such file or directory

   There is no `workflows/alb/` to fall back to — the default was
   effectively dead. Switching the default to `istio` matches what the
   repo actually contains; tenants that later add `workflows/alb/`
   and want ALB can export `INGRESS_TYPE=alb` explicitly.

2) `SERVICE_PATH` was only populated if the agent passed
   `--service-path=<abs-path>` as an argument. Without it, the path
   starts empty and every derived path becomes absolute
   (`/workflows/…`, `/values.yaml`), missing the service-root prefix
   entirely. The script already computes `WORKING_DIRECTORY` from its
   own location; the service root is `WORKING_DIRECTORY/..`. Using
   that as a fallback keeps `--service-path` as an allowed override
   but removes it as a silent requirement for basic operation.

Both fixes are backward-compatible: tenants currently passing
`INGRESS_TYPE=alb` + `--service-path=…` keep the same behaviour.
The Kubernetes Gateway API rejects non-absolute values for `Exact` and
`PathPrefix` match types:

  spec.rules[N].matches[M].path: Invalid value: "object":
  value must be an absolute path and start with '/' when type one of
  ['Exact', 'PathPrefix']

Developers entering "health" in the UI reasonably expect it to be
treated as "/health" — the UI doesn't document the requirement, and
the scope UI for other nullplatform services is forgiving on this.
The rejected route surfaces as a failed kubectl apply deep in the
agent workflow, with no hint that the fix is "add a slash".

`detect_path_type` now normalizes the value for the two absolute-only
types (`Exact`, `PathPrefix`), leaving `RegularExpression` untouched
since regex paths are free-form.

Also handles the pre-existing edge case where wildcard inputs like
`users/*` stripped to `users` (no leading slash); now normalized to
`/users`.
Comment thread endpoint-exposer/install/specs/service-spec.json.tpl
Comment thread endpoint-exposer/install/specs/service-spec.json.tpl
Copy link
Copy Markdown
Contributor

@sebastiancorrea81 sebastiancorrea81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Clarify that `name` and `visible_to` in service-spec.json.tpl are ignored
at apply time because the service_definition module overrides them from
TF variables. Prevents future authors from adding `{{ env.Getenv ... }}`
template expressions to non-overridden fields, which would reach the
nullplatform API as literals (no template engine in the pipeline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jcastiarena jcastiarena merged commit a70248d into main Apr 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants