Skip to content

[WIP]: defensive production#32

Open
yuvalk wants to merge 10 commits intoRHEcosystemAppEng:mainfrom
yuvalk:defensive-production
Open

[WIP]: defensive production#32
yuvalk wants to merge 10 commits intoRHEcosystemAppEng:mainfrom
yuvalk:defensive-production

Conversation

@yuvalk
Copy link
Copy Markdown
Collaborator

@yuvalk yuvalk commented Mar 11, 2026

  • feat: add PRODUCTION mode with 10 defensive security guards
  • feat: add PRODUCTION mode with 10 defensive security guards
  • test: add comprehensive production mode tests
  • fix: update test_settings match string for K_SERVICE error message
  • docs: add production mode to README and test matrix to tests/README

yuvalk and others added 6 commits March 11, 2026 19:41
When PRODUCTION=true, the application enforces security guards at
startup (fail-fast) and runtime: Vertex AI only, JWT validation,
no debug, HTTPS URLs, PostgreSQL, HTTP MCP transport, JWT forwarding
to MCP, no CORS middleware, SSO credentials, and DCR configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	src/lightspeed_agent/config/settings.py
Add a single PRODUCTION=true env var that enforces production-only
configuration at startup via Pydantic model validators. All violations
are collected and reported together so operators can fix everything at once.

Guards enforced:
1. Force Vertex AI (disallow GOOGLE_API_KEY, require GOOGLE_CLOUD_PROJECT)
2. Force JWT validation (block SKIP_JWT_VALIDATION=true)
3. Disable debug (block DEBUG=true)
4. Force HTTPS on AGENT_PROVIDER_URL and MCP_SERVER_URL
5. Force PostgreSQL (block sqlite DATABASE_URL)
6. Force MCP http transport (block stdio mode)
7. Force JWT forwarding to MCP (block LIGHTSPEED_CLIENT_ID/SECRET)
8. No CORS middleware in both agent and marketplace handler apps
9. Require SSO credentials (RED_HAT_SSO_CLIENT_ID/SECRET)
10. Require DCR config (DCR_ENABLED, DCR_INITIAL_ACCESS_TOKEN, DCR_ENCRYPTION_KEY)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
30 tests covering all 10 production security guards:
- Happy path with valid production config
- Individual guard violation tests (guards 1-10)
- Multiple simultaneous violations reported together
- Production=false bypasses all guards
- CORS middleware disabled in production (agent + marketplace apps)
- MCP header provider forwards JWT in production mode

Also adds pythonpath=["src"] to pytest config so the worktree's
source code is loaded instead of the editable install from the
main repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The K_SERVICE validator error message was updated during the production
mode implementation. Update the existing test to match the new wording.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README.md: add Production Mode section with guard table, example
  error output, and startup vs runtime enforcement explanation
- tests/README.md: new file documenting the test suite with a full
  test matrix for all 30 production mode tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yuvalk
Copy link
Copy Markdown
Collaborator Author

yuvalk commented Mar 11, 2026

marking as WIP for now, as I need to go over this with @luis5tb

@yuvalk yuvalk added the WIP label Mar 11, 2026
@yuvalk yuvalk changed the title defensive production [WIP]: defensive production Mar 11, 2026

# Production Mode
production: bool = Field(
default=False,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to have this enabled by default, so that we cannot forget to enable it and you need to do something to disable it?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any comment on this? I would still make this the default

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point i agree

)

# Guard 7: Force JWT forwarding to MCP (no service-account creds)
if self.lightspeed_client_id:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is now removed, we can skip self.lightspeed_client_id/secret

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase needed

"lightspeed-client-id": settings.lightspeed_client_id,
"lightspeed-client-secret": settings.lightspeed_client_secret,
}
# Skipped in production mode (Guard 7 enforces JWT forwarding)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, not needed anymore

@luis5tb
Copy link
Copy Markdown
Collaborator

luis5tb commented Mar 12, 2026

Other than the rebase, we may need to update the deploy/cloudrun/README to ensure the production flag is described and enabled (or just the description and move it to enable by default so that nothing needs to be done regarding the deployment

yuvalk added 4 commits March 14, 2026 04:03
…hat_sso_issuer

Both URLs are security-sensitive: marketplace_handler_url is used for
DCR redirects and red_hat_sso_issuer carries client credentials for
token introspection. Also sanitize URLs in error messages to prevent
credential leakage.
The session database stores ADK sessions and conversation history.
If session_database_url is set, it should also be checked for SQLite
to match the guard's intent of enforcing PostgreSQL in production.
When both K_SERVICE and PRODUCTION=true are set,
_block_skip_jwt_in_production and Guard 2 would both fire for
skip_jwt_validation, producing duplicate errors. Skip the K_SERVICE
check when production mode is active since Guard 2 already handles it.
In production mode, sending unauthenticated requests to MCP is both a
security concern and an operational problem. Raise RuntimeError instead
of returning empty headers so the user gets a clear auth error rather
than a confusing MCP 401 response.
Comment thread tests/README.md
@@ -0,0 +1,64 @@
# Test Suite
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file needed?

Copy link
Copy Markdown
Collaborator

@luis5tb luis5tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor things, but a rebase on main is needed

)

@model_validator(mode="after")
def _block_skip_jwt_in_production(self) -> Self:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this method to _block_skip_jwt_on_cloud_run, since in case of production=True it early-returns with _enforce_production_guards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants