Skip to content

Comments

Added automated model checks to the model upload workflow, including …#339

Open
shubhangi-raj-meesho wants to merge 6 commits intodevelopfrom
dev/automate-model-checks
Open

Added automated model checks to the model upload workflow, including …#339
shubhangi-raj-meesho wants to merge 6 commits intodevelopfrom
dev/automate-model-checks

Conversation

@shubhangi-raj-meesho
Copy link
Contributor

@shubhangi-raj-meesho shubhangi-raj-meesho commented Feb 18, 2026

Context

Added automated model checks to the model upload workflow, including validation for logger and print statements in Python backend models, and ensuring warmup configuration is present in config.pbtxt.

Describe your changes

5 files changed across horizon/internal/predator/handler/ and horizon/internal/externalcall/.

1. Warmup configuration validation (predator_upload.go)

  • Extended validateModelConfiguration to check that non-ensemble models define at least one model_warmup entry in config.pbtxt.
  • Ensemble models (identified by backend/platform set to "ensemble" or presence of ensemble_scheduling) are excluded from this check because they orchestrate sub-models and don't hold weights that need warming up.
  • Added helper isEnsembleModel that detects ensemble models via backend, platform, or ensemble_scheduling fields.

2. Logger/print statement validation for Python backend models (predator_upload.go)

  • Added validateNoLoggerOrPrintStatements, invoked during full uploads only (partial uploads only sync config.pbtxt and don't touch Python files):
    1. Reads and parses config.pbtxt to determine the backend type.
    2. If the backend is "python", enumerates all .py files in the model source directory using the new ListFilesWithSuffix GCS method.
    3. For each .py file, scans for uncommented logger.info(), logger.debug(), and print() calls that are outside initialize() and finalize() functions (logging in those lifecycle methods is acceptable since they only run during model load/unload, not inference).
    4. Returns a balanced, capped error summary (max 5 violations distributed round-robin across files) via buildBalancedViolationSummary.
  • Added helper isPythonBackendModel and scanning function hasPythonLoggerOrPrintStatements which uses indentation-based scope tracking to determine which Python function each line belongs to.

3. New GCS utility method (gcs_client.go)

  • Added ListFilesWithSuffix to GCSClientInterface and its implementation on GCSClient.
  • Unlike the existing FindFileWithSuffix (which returns only the first match), this method collects all object paths matching the given suffix. Directory markers are skipped.

4. New types and constants (model.go, predator_constants.go)

  • Added fileViolationInfo struct to group per-file violation details for balanced error reporting.
  • Added constants: errWarmupConfigMissing, errLoggerOrPrintStatementsFound, backend identifiers (pythonBackend, ensembleBackend, ensemblePlatform), allowed function names (initialize, finalize), file suffix (.py), and display cap (maxDisplayedViolationsForPythonModel = 5).

5. Minor formatting fixes (predator_constants.go, model.go)

  • Aligned whitespace on fieldDiscoveryConfigID, errNoOutputDefinitions, and ServiceDeployableID struct tag to match surrounding constant/field alignment.

6. Unit tests (predator_upload_test.go)

  • Added unit tests for the new code changes.

Testing

  • Local testing on Custodian: Tested the model validation flow end-to-end locally using the Custodian setup. Verified the following scenarios:
    • Warmup check: Uploaded a non-ensemble model without model_warmup in config.pbtxt- upload was correctly rejected with the errWarmupConfigMissing error. Uploading with warmup config present succeeded. Ensemble models correctly bypassed the warmup check.
    • Logger/print check: Uploaded a Python backend model containing logger.info() and print() statements outside initialize()/finalize()- upload was rejected with a clear violation summary including file names and line numbers. Verified that commented-out statements are ignored. Verified that non-Python backend models skip this check entirely.
    • Partial upload: Confirmed that partial uploads skip the logger/print validation as intended, since partial uploads only sync config.pbtxt.

Monitoring

  • Existing logging is sufficient for tracking these checks post-deployment. Each validation step logs an Info-level message on entry and on pass/skip, and a Warn-level message when violations are detected. These are visible in the existing Horizon service logs.
  • The upload API returns descriptive error messages to the caller, so failures will also surface in the Custodian UI and any upstream monitoring that tracks upload failure rates.

Rollback plan

  • These validations are additive checks within the existing validateSourceModel flow and do not modify any data, schemas, or external state.
  • Rollback is a simple code revert of this commit - previous model upload behavior (no warmup or logger/print checks) is restored immediately.

Checklist before requesting a review

  • [✅] I have reviewed my own changes?
  • [✅] Relevant or critical functionality is covered by tests?
  • [✅] Monitoring needs have been evaluated?
  • [✅] Any necessary documentation updates have been considered?

📂 Modules Affected

  • [✅] horizon (Real-time systems / networking)
  • online-feature-store (Feature serving infra)
  • trufflebox-ui (Admin panel / UI)
  • infra (Docker, CI/CD, GCP/AWS setup)
  • docs (Documentation updates)
  • Other: ___________

✅ Type of Change

  • [✅] Feature addition
  • Bug fix
  • Infra / build system change
  • Performance improvement
  • Refactor
  • Documentation
  • Other: ___________

Summary by CodeRabbit

  • New Features

    • Added cloud-storage file enumeration by suffix.
    • Enhanced full-upload validation: config parsing, warmup checks for non-ensemble models, ensemble- and Python-backend awareness.
    • Python-backend scans to detect logger/print usage outside allowed init/finalize regions, with concise, balanced violation summaries.
  • Tests

    • Added comprehensive unit tests and a mock cloud-storage client covering validation flows, backend detection, Python logging/print detection, and summary formatting.

…validation for logger and print statements in Python backend models, and ensuring warmup configuration is present in config.pbtxt.
@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Walkthrough

Adds a GCS helper to list objects by suffix and extends Predator upload validation to detect uncommented Python logger.* and print statements outside allowed functions, integrating per-file scanning and balanced violation summaries into the upload validation flow.

Changes

Cohort / File(s) Summary
GCS Client Enhancement
horizon/internal/externalcall/gcs_client.go
Added ListFilesWithSuffix(bucket, folderPath, suffix string) ([]string, error) to the GCS client interface and implementation to enumerate objects under a folder matching a suffix, with validation, logging, and error wrapping.
Model Validation Types & Constants
horizon/internal/predator/handler/model.go, horizon/internal/predator/handler/predator_constants.go
Added unexported fileViolationInfo and funcScopeEntry types and several internal constants (error markers, backend/platform markers, .py suffix, allowed function names, display limits) to support Python-source validation logic.
Upload Validation Logic
horizon/internal/predator/handler/predator_upload.go
Integrated full-upload checks: detect ensemble/Python backends, require warmup for non-ensemble models, list .py files, scan for logger/print usages outside initialize/finalize, aggregate per-file violations, and build balanced, truncated summaries. Added helpers: isEnsembleModel, isPythonBackendModel, validateNoLoggerOrPrintStatements, hasPythonLoggerOrPrintStatements, and buildBalancedViolationSummary.
Tests & Mocks
horizon/internal/predator/handler/predator_upload_test.go
Added comprehensive tests and a mockGCSClient implementing externalcall.GCSClientInterface with configurable ReadFile and ListFilesWithSuffix. Tests cover backend detection, warmup/config validation, Python scanning edge cases, and summary formatting.

Sequence Diagram(s)

sequenceDiagram
    participant UploadHandler as Upload Handler
    participant ConfigValidator as Config Validator
    participant SourceScanner as Source Scanner
    participant GCSClient as GCS Client
    participant PythonScanner as Python Scanner
    participant SummaryBuilder as Summary Builder

    UploadHandler->>ConfigValidator: validateSourceModel(gcsPath)
    ConfigValidator->>ConfigValidator: validateModelConfiguration()
    ConfigValidator->>SourceScanner: validateNoLoggerOrPrintStatements(gcsPath)
    SourceScanner->>GCSClient: Read config.pbtxt
    GCSClient-->>SourceScanner: config content
    SourceScanner->>SourceScanner: parse config, determine backend
    alt isPythonBackend
        SourceScanner->>GCSClient: ListFilesWithSuffix(bucket, folder, ".py")
        GCSClient-->>SourceScanner: Python file list
        loop For each Python file
            SourceScanner->>GCSClient: Read file content
            GCSClient-->>SourceScanner: file content
            SourceScanner->>PythonScanner: hasPythonLoggerOrPrintStatements(content)
            PythonScanner-->>SourceScanner: violations list
        end
        SourceScanner->>SummaryBuilder: buildBalancedViolationSummary(perFileViolations, totalViolations)
        SummaryBuilder-->>SourceScanner: violation summary
    end
    SourceScanner-->>ConfigValidator: violations (if any)
    ConfigValidator-->>UploadHandler: validation result
Loading
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Krd Checker ⚠️ Warning PR description lacks both a KRD link (starting with 'KRD:' containing a Google Doc URL) and a valid KRD exemption (Pod Type designation and 'Why KRD is not required?' justification). Add either a KRD link or complete KRD exemption section with Pod Type (H0/H1/H2) and justification (minimum 20 characters) to the PR description.
✅ Passed checks (1 passed)
Check name Status Explanation
Dynamic Configuration Validation ✅ Passed No application-dyn-*.yml files were modified in this PR; all changes are limited to horizon internal packages.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
horizon/internal/predator/handler/predator_upload.go (1)

559-593: Redundant GCS read of config.pbtxt — already parsed in validateModelConfiguration.

validateModelConfiguration (called just before at line 159) already reads and parses config.pbtxt into a ModelConfig. This method reads and parses it a second time. Consider passing the parsed ModelConfig (or at least the backend string) from the caller to avoid the extra GCS round-trip.

♻️ Sketch: pass ModelConfig from the caller

Refactor validateModelConfiguration to return the parsed config, then pass it downstream:

-func (p *Predator) validateModelConfiguration(gcsPath string) error {
+func (p *Predator) validateModelConfiguration(gcsPath string) (*ModelConfig, error) {
     ...
-    return nil
+    return &modelConfig, nil
 }

In validateSourceModel:

-    if err := p.validateModelConfiguration(gcsPath); err != nil {
+    modelConfig, err := p.validateModelConfiguration(gcsPath)
+    if err != nil {
         return fmt.Errorf("config.pbtxt validation failed: %w", err)
     }
     ...
-        if err := p.validateNoLoggerOrPrintStatements(gcsPath); err != nil {
+        if err := p.validateNoLoggerOrPrintStatements(gcsPath, modelConfig); err != nil {

Then validateNoLoggerOrPrintStatements can skip the re-read and use isPythonBackendModel(modelConfig) directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload.go` around lines 559 - 593,
The validator redundantly re-reads and parses config.pbtxt inside
validateNoLoggerOrPrintStatements causing an extra GCS round-trip; change the
call chain so the parsed ModelConfig is propagated from
validateModelConfiguration (or returned from it) through validateSourceModel
into validateNoLoggerOrPrintStatements and use that ModelConfig with
isPythonBackendModel instead of re-reading via p.GcsClient.ReadFile and
prototext.Unmarshal. Update validateModelConfiguration to return (*ModelConfig,
error) (or provide the backend string), adjust validateSourceModel to accept and
forward the parsed ModelConfig to validateNoLoggerOrPrintStatements, and remove
the config read/parse block from validateNoLoggerOrPrintStatements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 748-766: The current single-variable scope tracker
(currentFunction/functionIndent) misclassifies nested defs inside allowed
functions because a nested "def" overwrites currentFunction; change to a small
function scope stack (e.g., functionStack storing tuples of (name, indent)) or
at minimum track an outerAllowedFunction alongside the currentFunction: when
funcDefPattern matches push (name, lineIndent) onto functionStack (or set
currentFunction and if the previous top was an allowed function record it as
outerAllowedFunction), on dedent pop entries whose indent >= lineIndent (or when
lineIndent <= topIndent) to restore the previous scope, and when checking
whether to skip logging use a membership check against
allowedFuncInitialize/allowedFuncFinalize on the stack/top rather than equality
with currentFunction; update all uses of currentFunction/functionIndent to use
the stack/top variables (or use outerAllowedFunction) so nested defs inside
initialize/finalize no longer cause false positives.
- Around line 741-778: The logger/print detection currently runs against the
whole line and thus flags inline comments (e.g., x = f()  # logger.info(...));
add a lightweight inline-comment stripper and use its output for loggerPattern
and printPattern checks: implement a helper like stripInlineComment(line string)
string that removes a trailing '#' comment while avoiding obvious '#' inside
quotes (simple state tracking for single/double quotes), then replace the
MatchString calls that reference loggerPattern and printPattern to use the
stripped code portion (instead of the full line) while keeping the existing
logic that skips full-line comments and the function-scope checks (symbols:
funcDefPattern, currentFunction, allowedFuncInitialize, allowedFuncFinalize,
loggerPattern, printPattern).
- Around line 768-778: The code appends the same line to details twice when both
loggerPattern and printPattern match; update the matching logic in the loop
(where loggerPattern.MatchString(line), printPattern.MatchString(line), details,
and found are used) to prevent duplicate reports by making the printPattern
check conditional (use else if printPattern.MatchString(line)) or immediately
continue after a match so only the first match per line appends to details and
sets found.

---

Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 559-593: The validator redundantly re-reads and parses
config.pbtxt inside validateNoLoggerOrPrintStatements causing an extra GCS
round-trip; change the call chain so the parsed ModelConfig is propagated from
validateModelConfiguration (or returned from it) through validateSourceModel
into validateNoLoggerOrPrintStatements and use that ModelConfig with
isPythonBackendModel instead of re-reading via p.GcsClient.ReadFile and
prototext.Unmarshal. Update validateModelConfiguration to return (*ModelConfig,
error) (or provide the backend string), adjust validateSourceModel to accept and
forward the parsed ModelConfig to validateNoLoggerOrPrintStatements, and remove
the config read/parse block from validateNoLoggerOrPrintStatements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
horizon/internal/predator/handler/predator_upload_test.go (4)

323-331: Round-robin test doesn't verify the display cap is enforced.

The test asserts both file names appear but doesn't confirm the total violations shown is capped at displayCap (5). With 6 total violations across two files, a buggy implementation that shows all 6 would still pass this assertion.

💡 Suggested additional assertion
 		{
 			name: "two files with round-robin distribution",
 			violations: []fileViolationInfo{
 				{fileName: "a.py", details: []string{"line 1: print('a1')", "line 2: print('a2')", "line 3: print('a3')"}},
 				{fileName: "b.py", details: []string{"line 10: print('b1')", "line 20: print('b2')", "line 30: print('b3')"}},
 			},
 			totalViolations: 6,
 			wantContains:    []string{"a.py", "b.py"},
+			// Verify cap: "line 3: print" and "line 30: print" should NOT both appear
+			wantNotContain:  []string{"line 3: print('a3')", "line 30: print('b3')"},
 		},

Note: adjust wantNotContain entries to match whichever items the round-robin algorithm drops last.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 323 -
331, The test case "two files with round-robin distribution" currently only
asserts both filenames appear but doesn't check that the displayed violations
are capped at displayCap (5); update the test in predator_upload_test.go to
assert the rendered output contains at most displayCap violation lines (verify
count of violation detail lines or entries equals 5) and add assertions for
specific entries that should be dropped by the round-robin algorithm (adjust
wantNotContain accordingly) using the existing fileViolationInfo fixtures and
the test's output parsing helpers so the test fails if all 6 violations are
shown.

151-276: Add a test documenting whether logger.warning/logger.error are flagged.

The PR description explicitly scopes detection to logger.info, logger.debug, and print. However, there is no test case confirming that logger.warning or logger.error inside execute() is not flagged. Without this, it's ambiguous whether the exclusion is intentional or a gap.

💡 Suggested additional test cases
+		{
+			name:        "logger.warning in execute - not flagged (only info/debug/print are scanned)",
+			content:     "def execute(requests):\n    logger.warning('warn')\n    return []\n",
+			wantFound:   false,
+			wantDetails: 0,
+		},
+		{
+			name:        "print inside string literal not flagged",
+			content:     "def execute(requests):\n    msg = \"print('debug')\"\n    return []\n",
+			wantFound:   false,
+			wantDetails: 0,
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 151 -
276, Add explicit test cases in TestHasPythonLoggerOrPrintStatements to cover
logger.warning and logger.error inside execute() (and perhaps mixed with
logger.info) so behavior is documented; update the tests slice to include
entries named like "logger.warning inside execute not flagged" and "logger.error
inside execute not flagged" that call hasPythonLoggerOrPrintStatements with
content containing logger.warning(...) and logger.error(...) respectively and
assert wantFound=false and wantDetails=0; reference the test harness around
hasPythonLoggerOrPrintStatements to place the new cases alongside existing ones
(e.g., near "logger.debug inside execute function" and "logger.INFO case
insensitive match") so the intended exclusion of warning/error is covered.

567-594: Missing test for ListFilesWithSuffix returning an error.

TestValidateNoLoggerOrPrintStatements_NoPythonFiles tests an empty list but there is no test for when listFilesWithSuffix itself returns an error. Depending on how the production code handles this (propagate vs. ignore), this path is currently unverified.

💡 Suggested additional test
+func TestValidateNoLoggerOrPrintStatements_ListFilesError(t *testing.T) {
+	configData := `name: "py_model"
+backend: "python"
+max_batch_size: 8
+`
+	gcs := &mockGCSClient{
+		readFile: func(_, objectPath string) ([]byte, error) {
+			if strings.HasSuffix(objectPath, "config.pbtxt") {
+				return []byte(configData), nil
+			}
+			return nil, fmt.Errorf("unexpected read")
+		},
+		listFilesWithSuffix: func(_, _, _ string) ([]string, error) {
+			return nil, fmt.Errorf("GCS permission denied")
+		},
+	}
+
+	p := &Predator{GcsClient: gcs}
+	err := p.validateNoLoggerOrPrintStatements("gs://test-bucket/models/py_model")
+	require.Error(t, err)
+	assert.Contains(t, err.Error(), "GCS permission denied") // or a wrapped message
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 567 -
594, Add a new unit test that exercises the error path when the GCS client's
listFilesWithSuffix returns an error: create a mockGCSClient where readFile
returns the configData for "config.pbtxt" and listFilesWithSuffix returns a
non-nil error, call Predator.validateNoLoggerOrPrintStatements (via a Predator
with GcsClient set to the mock), and assert that the call returns an error (and
optionally that the error message propagates). Reference
validateNoLoggerOrPrintStatements, mockGCSClient, and listFilesWithSuffix when
locating code to modify or add the test.

399-427: Missing warmup test coverage for non-Python non-ensemble backends.

The PR description states warmup is required for all non-ensemble models, but the test suite only exercises Python (fails without warmup) and ensemble (skips check) scenarios. An onnxruntime or tensorrt_llm model without warmup is untested. A regression that silently skips warmup enforcement for non-Python non-ensemble backends would pass this test suite undetected.

💡 Suggested additional test cases
+	onnxWithoutWarmup := `name: "onnx_model"
+backend: "onnxruntime"
+max_batch_size: 8
+`
+	onnxWithWarmup := `name: "onnx_model"
+backend: "onnxruntime"
+max_batch_size: 8
+model_warmup {
+  name: "warmup"
+  batch_size: 1
+}
+`
// In tests slice:
+		{
+			name:       "onnxruntime model without warmup fails",
+			configData: onnxWithoutWarmup,
+			wantErr:    true,
+			errContain: "model_warmup configuration is missing",
+		},
+		{
+			name:       "onnxruntime model with warmup passes",
+			configData: onnxWithWarmup,
+			wantErr:    false,
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 399 -
427, The test table "tests" currently covers Python and ensemble cases but omits
non-Python/non-ensemble backends; add test cases using the same table pattern
for at least one onnxruntime config and one tensorrt_llm config (e.g., use
existing config fixtures like onnxConfigWithoutWarmup/onnxConfigWithWarmup and
tensorrtConfigWithoutWarmup) asserting that missing model_warmup produces
wantErr=true and the errContain message "model_warmup configuration is missing",
and that configs including model_warmup pass (wantErr=false); update the tests
slice in predator_upload_test.go so the test runner exercises these backends via
the same loop that handles pythonWithWarmup, pythonWithoutWarmup, and
ensembleConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@horizon/internal/predator/handler/predator_upload_test.go`:
- Around line 57-106: Add tests and/or normalize case in the ensemble detection:
update TestIsEnsembleModel to include additional cases where ModelConfig.Backend
and ModelConfig.Platform are mixed-case (e.g. "Ensemble", "ENSEMBLE",
"EnSeMbLe") and assert isEnsembleModel returns true; and modify the production
isEnsembleModel implementation to compare case-insensitively (use
strings.EqualFold or strings.ToLower(...) == "ensemble") when checking
ModelConfig.Backend and ModelConfig.Platform while keeping the existing check
for SchedulingChoice / ModelConfig_EnsembleScheduling (EnsembleScheduling /
ModelEnsembling) unchanged.

---

Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload_test.go`:
- Around line 323-331: The test case "two files with round-robin distribution"
currently only asserts both filenames appear but doesn't check that the
displayed violations are capped at displayCap (5); update the test in
predator_upload_test.go to assert the rendered output contains at most
displayCap violation lines (verify count of violation detail lines or entries
equals 5) and add assertions for specific entries that should be dropped by the
round-robin algorithm (adjust wantNotContain accordingly) using the existing
fileViolationInfo fixtures and the test's output parsing helpers so the test
fails if all 6 violations are shown.
- Around line 151-276: Add explicit test cases in
TestHasPythonLoggerOrPrintStatements to cover logger.warning and logger.error
inside execute() (and perhaps mixed with logger.info) so behavior is documented;
update the tests slice to include entries named like "logger.warning inside
execute not flagged" and "logger.error inside execute not flagged" that call
hasPythonLoggerOrPrintStatements with content containing logger.warning(...) and
logger.error(...) respectively and assert wantFound=false and wantDetails=0;
reference the test harness around hasPythonLoggerOrPrintStatements to place the
new cases alongside existing ones (e.g., near "logger.debug inside execute
function" and "logger.INFO case insensitive match") so the intended exclusion of
warning/error is covered.
- Around line 567-594: Add a new unit test that exercises the error path when
the GCS client's listFilesWithSuffix returns an error: create a mockGCSClient
where readFile returns the configData for "config.pbtxt" and listFilesWithSuffix
returns a non-nil error, call Predator.validateNoLoggerOrPrintStatements (via a
Predator with GcsClient set to the mock), and assert that the call returns an
error (and optionally that the error message propagates). Reference
validateNoLoggerOrPrintStatements, mockGCSClient, and listFilesWithSuffix when
locating code to modify or add the test.
- Around line 399-427: The test table "tests" currently covers Python and
ensemble cases but omits non-Python/non-ensemble backends; add test cases using
the same table pattern for at least one onnxruntime config and one tensorrt_llm
config (e.g., use existing config fixtures like
onnxConfigWithoutWarmup/onnxConfigWithWarmup and tensorrtConfigWithoutWarmup)
asserting that missing model_warmup produces wantErr=true and the errContain
message "model_warmup configuration is missing", and that configs including
model_warmup pass (wantErr=false); update the tests slice in
predator_upload_test.go so the test runner exercises these backends via the same
loop that handles pythonWithWarmup, pythonWithoutWarmup, and ensembleConfig.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
horizon/internal/predator/handler/predator_upload.go (1)

148-171: Clarify the validateSourceModel comment vs. behavior.
validateModelConfiguration (including warmup enforcement) runs for both partial and full uploads, but the comment reads as full-only. Consider updating the comment, or moving the warmup check under !isPartial if that was intended.

📝 Suggested comment tweak
-// For full uploads it also validates:
-//   - Model configuration (config.pbtxt is valid and parseable) and warmup configuration is present for non-ensemble models
-//   - Complete model structure (version 1/ folder exists with files)
-//   - Python backend models do not contain logger/print statements outside initialize()/finalize()
+// Validates config.pbtxt (including warmup for non-ensemble models).
+// For full uploads it also validates:
+//   - Complete model structure (version 1/ folder exists with files)
+//   - Python backend models do not contain logger/print statements outside initialize()/finalize()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload.go` around lines 148 - 171,
The comment for validateSourceModel is out-of-sync with behavior:
validateModelConfiguration (including warmup enforcement) is invoked for both
partial and full uploads though the comment implies warmup is full-only; update
the function comment to state that model configuration (config.pbtxt and warmup
enforcement) is validated for both partial and full uploads, referencing
validateModelConfiguration and the isPartial flag, or alternatively move the
warmup-specific check out of validateModelConfiguration and call it only when
!isPartial inside validateSourceModel so that validateModelConfiguration covers
only config parsing while validateCompleteModelStructure and
validateNoLoggerOrPrintStatements remain full-upload only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 148-171: The comment for validateSourceModel is out-of-sync with
behavior: validateModelConfiguration (including warmup enforcement) is invoked
for both partial and full uploads though the comment implies warmup is
full-only; update the function comment to state that model configuration
(config.pbtxt and warmup enforcement) is validated for both partial and full
uploads, referencing validateModelConfiguration and the isPartial flag, or
alternatively move the warmup-specific check out of validateModelConfiguration
and call it only when !isPartial inside validateSourceModel so that
validateModelConfiguration covers only config parsing while
validateCompleteModelStructure and validateNoLoggerOrPrintStatements remain
full-upload only.

@pavan-adari-meesho
Copy link
Contributor

@shubhangi-raj-meesho take pull from develop and push again

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
horizon/internal/predator/handler/predator_upload.go (2)

562-572: validateNoLoggerOrPrintStatements re-reads and re-parses config.pbtxt that validateModelConfiguration already consumed.

By the time validateNoLoggerOrPrintStatements is reached, validateModelConfiguration has already successfully read and unmarshalled the same file. This is a redundant GCS round-trip (and a TOCTOU window). Consider threading the already-parsed *ModelConfig into this function, or extracting a shared config-read step in validateSourceModel.

♻️ Sketch: pass parsed config in
-func (p *Predator) validateNoLoggerOrPrintStatements(gcsPath string) error {
+func (p *Predator) validateNoLoggerOrPrintStatements(gcsPath string, modelConfig *ModelConfig) error {
 	srcBucket, srcPath := extractGCSPath(gcsPath)
 	if srcBucket == "" || srcPath == "" {
 		return fmt.Errorf("invalid GCS path format: %s", gcsPath)
 	}
 
-	// Read and parse config.pbtxt to determine backend type.
-	configPath := path.Join(srcPath, configFile)
-	configData, err := p.GcsClient.ReadFile(srcBucket, configPath)
-	if err != nil {
-		return fmt.Errorf("failed to read config.pbtxt for logger/print check: %w", err)
-	}
-
-	var modelConfig ModelConfig
-	if err := prototext.Unmarshal(configData, &modelConfig); err != nil {
-		return fmt.Errorf("failed to parse config.pbtxt for logger/print check: %w", err)
-	}
-
-	if !isPythonBackendModel(&modelConfig) {
+	if !isPythonBackendModel(modelConfig) {

In validateSourceModel, thread the config through after parsing it once in validateModelConfiguration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload.go` around lines 562 - 572,
validateNoLoggerOrPrintStatements currently re-reads and re-parses config.pbtxt
after validateModelConfiguration already unmarshalled it; change the call sites
so validateModelConfiguration or validateSourceModel parses config.pbtxt once
into a ModelConfig and pass that parsed *ModelConfig into
validateNoLoggerOrPrintStatements (or extract a shared read step in
validateSourceModel) instead of calling GcsClient.ReadFile/prototext.Unmarshal
again, updating the function signature of validateNoLoggerOrPrintStatements and
all callers accordingly to remove the redundant GCS round-trip and TOCTOU
window.

685-687: Move compiled regexes to package-level vars — they're rebuilt on every .py file scan.

funcDefPattern, loggerPattern, and printPattern are recompiled on each call to hasPythonLoggerOrPrintStatements, which is invoked once per .py file. Since these patterns are immutable, they should be declared as package-level variables.

♻️ Proposed refactor

Add these alongside the other var blocks in predator_constants.go or at the top of this file:

+var (
+	funcDefPattern = regexp.MustCompile(`^def\s+(\w+)\s*\(`)
+	loggerPattern  = regexp.MustCompile(`(?i)logger\.(info|debug)\s*\(`)
+	printPattern   = regexp.MustCompile(`\bprint\s*\(`)
+)

Then remove the three regexp.MustCompile calls inside hasPythonLoggerOrPrintStatements:

-	funcDefPattern := regexp.MustCompile(`^def\s+(\w+)\s*\(`)
-	loggerPattern := regexp.MustCompile(`(?i)logger\.(info|debug)\s*\(`)
-	printPattern := regexp.MustCompile(`\bprint\s*\(`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload.go` around lines 685 - 687,
Declare funcDefPattern, loggerPattern, and printPattern as package-level
variables (e.g., in the existing var block in predator_constants.go or at the
top of predator_upload.go) using regexp.MustCompile so they are compiled once at
init; then remove the regexp.MustCompile calls inside
hasPythonLoggerOrPrintStatements and update that function to reference the
package-level funcDefPattern, loggerPattern, and printPattern variables
directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 683-729: The function hasPythonLoggerOrPrintStatements is
producing false positives because it doesn't track triple-quoted string spans;
update the main line loop to maintain boolean state flags (e.g., inTripleDouble,
inTripleSingle) that toggle when encountering unescaped triple double-quotes
(""") or triple single-quotes (''') across lines, skip
loggerPattern/printPattern checks while those flags are true, and ensure
transitions respect existing single/double-quote handling in stripInlineComment;
integrate this alongside the existing functionStack/funcScopeEntry logic so
loggerPattern and printPattern only append to details when not inside a
triple-quoted string and not inside allowed functions.

---

Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 562-572: validateNoLoggerOrPrintStatements currently re-reads and
re-parses config.pbtxt after validateModelConfiguration already unmarshalled it;
change the call sites so validateModelConfiguration or validateSourceModel
parses config.pbtxt once into a ModelConfig and pass that parsed *ModelConfig
into validateNoLoggerOrPrintStatements (or extract a shared read step in
validateSourceModel) instead of calling GcsClient.ReadFile/prototext.Unmarshal
again, updating the function signature of validateNoLoggerOrPrintStatements and
all callers accordingly to remove the redundant GCS round-trip and TOCTOU
window.
- Around line 685-687: Declare funcDefPattern, loggerPattern, and printPattern
as package-level variables (e.g., in the existing var block in
predator_constants.go or at the top of predator_upload.go) using
regexp.MustCompile so they are compiled once at init; then remove the
regexp.MustCompile calls inside hasPythonLoggerOrPrintStatements and update that
function to reference the package-level funcDefPattern, loggerPattern, and
printPattern variables directly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7e4692 and c1a4184.

📒 Files selected for processing (1)
  • horizon/internal/predator/handler/predator_upload.go

Comment on lines +683 to +729
lines := strings.Split(string(content), "\n")

funcDefPattern := regexp.MustCompile(`^def\s+(\w+)\s*\(`)
loggerPattern := regexp.MustCompile(`(?i)logger\.(info|debug)\s*\(`)
printPattern := regexp.MustCompile(`\bprint\s*\(`)

var functionStack []funcScopeEntry

for lineNum, line := range lines {
strippedLine := strings.TrimSpace(line)

if strippedLine == "" {
continue
}

if strings.HasPrefix(strippedLine, "#") {
continue
}

lineIndent := len(line) - len(strings.TrimLeft(line, " \t"))

for len(functionStack) > 0 && functionStack[len(functionStack)-1].indent >= lineIndent {
functionStack = functionStack[:len(functionStack)-1]
}

if funcMatch := funcDefPattern.FindStringSubmatch(strippedLine); len(funcMatch) > 1 {
functionStack = append(functionStack, funcScopeEntry{name: funcMatch[1], indent: lineIndent})
continue
}

if isInsideAllowedFunction(functionStack) {
continue
}

codePortion := stripInlineComment(line)

if loggerPattern.MatchString(codePortion) {
details = append(details, fmt.Sprintf("line %d: %s", lineNum+1, strippedLine))
found = true
} else if printPattern.MatchString(codePortion) {
details = append(details, fmt.Sprintf("line %d: %s", lineNum+1, strippedLine))
found = true
}
}

return found, details
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Triple-quoted multi-line strings cause false positives.

hasPythonLoggerOrPrintStatements has no state machine for """ / ''' spans. A logger call that appears only inside a docstring (outside initialize/finalize) will be falsely flagged:

def execute(self, requests):
    """
    Example: logger.info("request received")   # ← flagged as violation
    """

The line logger.info("request received") is not empty, does not start with #, and matches loggerPattern after stripInlineComment, so it gets reported.

Consider tracking inTripleDouble/inTripleSingle state similarly to how stripInlineComment tracks single/double quotes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@horizon/internal/predator/handler/predator_upload.go` around lines 683 - 729,
The function hasPythonLoggerOrPrintStatements is producing false positives
because it doesn't track triple-quoted string spans; update the main line loop
to maintain boolean state flags (e.g., inTripleDouble, inTripleSingle) that
toggle when encountering unescaped triple double-quotes (""") or triple
single-quotes (''') across lines, skip loggerPattern/printPattern checks while
those flags are true, and ensure transitions respect existing
single/double-quote handling in stripInlineComment; integrate this alongside the
existing functionStack/funcScopeEntry logic so loggerPattern and printPattern
only append to details when not inside a triple-quoted string and not inside
allowed functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants