Added automated model checks to the model upload workflow, including …#339
Added automated model checks to the model upload workflow, including …#339shubhangi-raj-meesho wants to merge 6 commits intodevelopfrom
Conversation
…validation for logger and print statements in Python backend models, and ensuring warmup configuration is present in config.pbtxt.
WalkthroughAdds a GCS helper to list objects by suffix and extends Predator upload validation to detect uncommented Python Changes
Sequence Diagram(s)sequenceDiagram
participant UploadHandler as Upload Handler
participant ConfigValidator as Config Validator
participant SourceScanner as Source Scanner
participant GCSClient as GCS Client
participant PythonScanner as Python Scanner
participant SummaryBuilder as Summary Builder
UploadHandler->>ConfigValidator: validateSourceModel(gcsPath)
ConfigValidator->>ConfigValidator: validateModelConfiguration()
ConfigValidator->>SourceScanner: validateNoLoggerOrPrintStatements(gcsPath)
SourceScanner->>GCSClient: Read config.pbtxt
GCSClient-->>SourceScanner: config content
SourceScanner->>SourceScanner: parse config, determine backend
alt isPythonBackend
SourceScanner->>GCSClient: ListFilesWithSuffix(bucket, folder, ".py")
GCSClient-->>SourceScanner: Python file list
loop For each Python file
SourceScanner->>GCSClient: Read file content
GCSClient-->>SourceScanner: file content
SourceScanner->>PythonScanner: hasPythonLoggerOrPrintStatements(content)
PythonScanner-->>SourceScanner: violations list
end
SourceScanner->>SummaryBuilder: buildBalancedViolationSummary(perFileViolations, totalViolations)
SummaryBuilder-->>SourceScanner: violation summary
end
SourceScanner-->>ConfigValidator: violations (if any)
ConfigValidator-->>UploadHandler: validation result
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
horizon/internal/predator/handler/predator_upload.go (1)
559-593: Redundant GCS read ofconfig.pbtxt— already parsed invalidateModelConfiguration.
validateModelConfiguration(called just before at line 159) already reads and parsesconfig.pbtxtinto aModelConfig. This method reads and parses it a second time. Consider passing the parsedModelConfig(or at least the backend string) from the caller to avoid the extra GCS round-trip.♻️ Sketch: pass ModelConfig from the caller
Refactor
validateModelConfigurationto return the parsed config, then pass it downstream:-func (p *Predator) validateModelConfiguration(gcsPath string) error { +func (p *Predator) validateModelConfiguration(gcsPath string) (*ModelConfig, error) { ... - return nil + return &modelConfig, nil }In
validateSourceModel:- if err := p.validateModelConfiguration(gcsPath); err != nil { + modelConfig, err := p.validateModelConfiguration(gcsPath) + if err != nil { return fmt.Errorf("config.pbtxt validation failed: %w", err) } ... - if err := p.validateNoLoggerOrPrintStatements(gcsPath); err != nil { + if err := p.validateNoLoggerOrPrintStatements(gcsPath, modelConfig); err != nil {Then
validateNoLoggerOrPrintStatementscan skip the re-read and useisPythonBackendModel(modelConfig)directly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload.go` around lines 559 - 593, The validator redundantly re-reads and parses config.pbtxt inside validateNoLoggerOrPrintStatements causing an extra GCS round-trip; change the call chain so the parsed ModelConfig is propagated from validateModelConfiguration (or returned from it) through validateSourceModel into validateNoLoggerOrPrintStatements and use that ModelConfig with isPythonBackendModel instead of re-reading via p.GcsClient.ReadFile and prototext.Unmarshal. Update validateModelConfiguration to return (*ModelConfig, error) (or provide the backend string), adjust validateSourceModel to accept and forward the parsed ModelConfig to validateNoLoggerOrPrintStatements, and remove the config read/parse block from validateNoLoggerOrPrintStatements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 748-766: The current single-variable scope tracker
(currentFunction/functionIndent) misclassifies nested defs inside allowed
functions because a nested "def" overwrites currentFunction; change to a small
function scope stack (e.g., functionStack storing tuples of (name, indent)) or
at minimum track an outerAllowedFunction alongside the currentFunction: when
funcDefPattern matches push (name, lineIndent) onto functionStack (or set
currentFunction and if the previous top was an allowed function record it as
outerAllowedFunction), on dedent pop entries whose indent >= lineIndent (or when
lineIndent <= topIndent) to restore the previous scope, and when checking
whether to skip logging use a membership check against
allowedFuncInitialize/allowedFuncFinalize on the stack/top rather than equality
with currentFunction; update all uses of currentFunction/functionIndent to use
the stack/top variables (or use outerAllowedFunction) so nested defs inside
initialize/finalize no longer cause false positives.
- Around line 741-778: The logger/print detection currently runs against the
whole line and thus flags inline comments (e.g., x = f() # logger.info(...));
add a lightweight inline-comment stripper and use its output for loggerPattern
and printPattern checks: implement a helper like stripInlineComment(line string)
string that removes a trailing '#' comment while avoiding obvious '#' inside
quotes (simple state tracking for single/double quotes), then replace the
MatchString calls that reference loggerPattern and printPattern to use the
stripped code portion (instead of the full line) while keeping the existing
logic that skips full-line comments and the function-scope checks (symbols:
funcDefPattern, currentFunction, allowedFuncInitialize, allowedFuncFinalize,
loggerPattern, printPattern).
- Around line 768-778: The code appends the same line to details twice when both
loggerPattern and printPattern match; update the matching logic in the loop
(where loggerPattern.MatchString(line), printPattern.MatchString(line), details,
and found are used) to prevent duplicate reports by making the printPattern
check conditional (use else if printPattern.MatchString(line)) or immediately
continue after a match so only the first match per line appends to details and
sets found.
---
Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 559-593: The validator redundantly re-reads and parses
config.pbtxt inside validateNoLoggerOrPrintStatements causing an extra GCS
round-trip; change the call chain so the parsed ModelConfig is propagated from
validateModelConfiguration (or returned from it) through validateSourceModel
into validateNoLoggerOrPrintStatements and use that ModelConfig with
isPythonBackendModel instead of re-reading via p.GcsClient.ReadFile and
prototext.Unmarshal. Update validateModelConfiguration to return (*ModelConfig,
error) (or provide the backend string), adjust validateSourceModel to accept and
forward the parsed ModelConfig to validateNoLoggerOrPrintStatements, and remove
the config read/parse block from validateNoLoggerOrPrintStatements.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
horizon/internal/predator/handler/predator_upload_test.go (4)
323-331: Round-robin test doesn't verify the display cap is enforced.The test asserts both file names appear but doesn't confirm the total violations shown is capped at
displayCap(5). With 6 total violations across two files, a buggy implementation that shows all 6 would still pass this assertion.💡 Suggested additional assertion
{ name: "two files with round-robin distribution", violations: []fileViolationInfo{ {fileName: "a.py", details: []string{"line 1: print('a1')", "line 2: print('a2')", "line 3: print('a3')"}}, {fileName: "b.py", details: []string{"line 10: print('b1')", "line 20: print('b2')", "line 30: print('b3')"}}, }, totalViolations: 6, wantContains: []string{"a.py", "b.py"}, + // Verify cap: "line 3: print" and "line 30: print" should NOT both appear + wantNotContain: []string{"line 3: print('a3')", "line 30: print('b3')"}, },Note: adjust
wantNotContainentries to match whichever items the round-robin algorithm drops last.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 323 - 331, The test case "two files with round-robin distribution" currently only asserts both filenames appear but doesn't check that the displayed violations are capped at displayCap (5); update the test in predator_upload_test.go to assert the rendered output contains at most displayCap violation lines (verify count of violation detail lines or entries equals 5) and add assertions for specific entries that should be dropped by the round-robin algorithm (adjust wantNotContain accordingly) using the existing fileViolationInfo fixtures and the test's output parsing helpers so the test fails if all 6 violations are shown.
151-276: Add a test documenting whetherlogger.warning/logger.errorare flagged.The PR description explicitly scopes detection to
logger.info,logger.debug, andlogger.warningorlogger.errorinsideexecute()is not flagged. Without this, it's ambiguous whether the exclusion is intentional or a gap.💡 Suggested additional test cases
+ { + name: "logger.warning in execute - not flagged (only info/debug/print are scanned)", + content: "def execute(requests):\n logger.warning('warn')\n return []\n", + wantFound: false, + wantDetails: 0, + }, + { + name: "print inside string literal not flagged", + content: "def execute(requests):\n msg = \"print('debug')\"\n return []\n", + wantFound: false, + wantDetails: 0, + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 151 - 276, Add explicit test cases in TestHasPythonLoggerOrPrintStatements to cover logger.warning and logger.error inside execute() (and perhaps mixed with logger.info) so behavior is documented; update the tests slice to include entries named like "logger.warning inside execute not flagged" and "logger.error inside execute not flagged" that call hasPythonLoggerOrPrintStatements with content containing logger.warning(...) and logger.error(...) respectively and assert wantFound=false and wantDetails=0; reference the test harness around hasPythonLoggerOrPrintStatements to place the new cases alongside existing ones (e.g., near "logger.debug inside execute function" and "logger.INFO case insensitive match") so the intended exclusion of warning/error is covered.
567-594: Missing test forListFilesWithSuffixreturning an error.
TestValidateNoLoggerOrPrintStatements_NoPythonFilestests an empty list but there is no test for whenlistFilesWithSuffixitself returns an error. Depending on how the production code handles this (propagate vs. ignore), this path is currently unverified.💡 Suggested additional test
+func TestValidateNoLoggerOrPrintStatements_ListFilesError(t *testing.T) { + configData := `name: "py_model" +backend: "python" +max_batch_size: 8 +` + gcs := &mockGCSClient{ + readFile: func(_, objectPath string) ([]byte, error) { + if strings.HasSuffix(objectPath, "config.pbtxt") { + return []byte(configData), nil + } + return nil, fmt.Errorf("unexpected read") + }, + listFilesWithSuffix: func(_, _, _ string) ([]string, error) { + return nil, fmt.Errorf("GCS permission denied") + }, + } + + p := &Predator{GcsClient: gcs} + err := p.validateNoLoggerOrPrintStatements("gs://test-bucket/models/py_model") + require.Error(t, err) + assert.Contains(t, err.Error(), "GCS permission denied") // or a wrapped message +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 567 - 594, Add a new unit test that exercises the error path when the GCS client's listFilesWithSuffix returns an error: create a mockGCSClient where readFile returns the configData for "config.pbtxt" and listFilesWithSuffix returns a non-nil error, call Predator.validateNoLoggerOrPrintStatements (via a Predator with GcsClient set to the mock), and assert that the call returns an error (and optionally that the error message propagates). Reference validateNoLoggerOrPrintStatements, mockGCSClient, and listFilesWithSuffix when locating code to modify or add the test.
399-427: Missing warmup test coverage for non-Python non-ensemble backends.The PR description states warmup is required for all non-ensemble models, but the test suite only exercises Python (fails without warmup) and ensemble (skips check) scenarios. An onnxruntime or tensorrt_llm model without warmup is untested. A regression that silently skips warmup enforcement for non-Python non-ensemble backends would pass this test suite undetected.
💡 Suggested additional test cases
+ onnxWithoutWarmup := `name: "onnx_model" +backend: "onnxruntime" +max_batch_size: 8 +` + onnxWithWarmup := `name: "onnx_model" +backend: "onnxruntime" +max_batch_size: 8 +model_warmup { + name: "warmup" + batch_size: 1 +} +` // In tests slice: + { + name: "onnxruntime model without warmup fails", + configData: onnxWithoutWarmup, + wantErr: true, + errContain: "model_warmup configuration is missing", + }, + { + name: "onnxruntime model with warmup passes", + configData: onnxWithWarmup, + wantErr: false, + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload_test.go` around lines 399 - 427, The test table "tests" currently covers Python and ensemble cases but omits non-Python/non-ensemble backends; add test cases using the same table pattern for at least one onnxruntime config and one tensorrt_llm config (e.g., use existing config fixtures like onnxConfigWithoutWarmup/onnxConfigWithWarmup and tensorrtConfigWithoutWarmup) asserting that missing model_warmup produces wantErr=true and the errContain message "model_warmup configuration is missing", and that configs including model_warmup pass (wantErr=false); update the tests slice in predator_upload_test.go so the test runner exercises these backends via the same loop that handles pythonWithWarmup, pythonWithoutWarmup, and ensembleConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@horizon/internal/predator/handler/predator_upload_test.go`:
- Around line 57-106: Add tests and/or normalize case in the ensemble detection:
update TestIsEnsembleModel to include additional cases where ModelConfig.Backend
and ModelConfig.Platform are mixed-case (e.g. "Ensemble", "ENSEMBLE",
"EnSeMbLe") and assert isEnsembleModel returns true; and modify the production
isEnsembleModel implementation to compare case-insensitively (use
strings.EqualFold or strings.ToLower(...) == "ensemble") when checking
ModelConfig.Backend and ModelConfig.Platform while keeping the existing check
for SchedulingChoice / ModelConfig_EnsembleScheduling (EnsembleScheduling /
ModelEnsembling) unchanged.
---
Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload_test.go`:
- Around line 323-331: The test case "two files with round-robin distribution"
currently only asserts both filenames appear but doesn't check that the
displayed violations are capped at displayCap (5); update the test in
predator_upload_test.go to assert the rendered output contains at most
displayCap violation lines (verify count of violation detail lines or entries
equals 5) and add assertions for specific entries that should be dropped by the
round-robin algorithm (adjust wantNotContain accordingly) using the existing
fileViolationInfo fixtures and the test's output parsing helpers so the test
fails if all 6 violations are shown.
- Around line 151-276: Add explicit test cases in
TestHasPythonLoggerOrPrintStatements to cover logger.warning and logger.error
inside execute() (and perhaps mixed with logger.info) so behavior is documented;
update the tests slice to include entries named like "logger.warning inside
execute not flagged" and "logger.error inside execute not flagged" that call
hasPythonLoggerOrPrintStatements with content containing logger.warning(...) and
logger.error(...) respectively and assert wantFound=false and wantDetails=0;
reference the test harness around hasPythonLoggerOrPrintStatements to place the
new cases alongside existing ones (e.g., near "logger.debug inside execute
function" and "logger.INFO case insensitive match") so the intended exclusion of
warning/error is covered.
- Around line 567-594: Add a new unit test that exercises the error path when
the GCS client's listFilesWithSuffix returns an error: create a mockGCSClient
where readFile returns the configData for "config.pbtxt" and listFilesWithSuffix
returns a non-nil error, call Predator.validateNoLoggerOrPrintStatements (via a
Predator with GcsClient set to the mock), and assert that the call returns an
error (and optionally that the error message propagates). Reference
validateNoLoggerOrPrintStatements, mockGCSClient, and listFilesWithSuffix when
locating code to modify or add the test.
- Around line 399-427: The test table "tests" currently covers Python and
ensemble cases but omits non-Python/non-ensemble backends; add test cases using
the same table pattern for at least one onnxruntime config and one tensorrt_llm
config (e.g., use existing config fixtures like
onnxConfigWithoutWarmup/onnxConfigWithWarmup and tensorrtConfigWithoutWarmup)
asserting that missing model_warmup produces wantErr=true and the errContain
message "model_warmup configuration is missing", and that configs including
model_warmup pass (wantErr=false); update the tests slice in
predator_upload_test.go so the test runner exercises these backends via the same
loop that handles pythonWithWarmup, pythonWithoutWarmup, and ensembleConfig.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
horizon/internal/predator/handler/predator_upload.go (1)
148-171: Clarify thevalidateSourceModelcomment vs. behavior.
validateModelConfiguration(including warmup enforcement) runs for both partial and full uploads, but the comment reads as full-only. Consider updating the comment, or moving the warmup check under!isPartialif that was intended.📝 Suggested comment tweak
-// For full uploads it also validates: -// - Model configuration (config.pbtxt is valid and parseable) and warmup configuration is present for non-ensemble models -// - Complete model structure (version 1/ folder exists with files) -// - Python backend models do not contain logger/print statements outside initialize()/finalize() +// Validates config.pbtxt (including warmup for non-ensemble models). +// For full uploads it also validates: +// - Complete model structure (version 1/ folder exists with files) +// - Python backend models do not contain logger/print statements outside initialize()/finalize()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload.go` around lines 148 - 171, The comment for validateSourceModel is out-of-sync with behavior: validateModelConfiguration (including warmup enforcement) is invoked for both partial and full uploads though the comment implies warmup is full-only; update the function comment to state that model configuration (config.pbtxt and warmup enforcement) is validated for both partial and full uploads, referencing validateModelConfiguration and the isPartial flag, or alternatively move the warmup-specific check out of validateModelConfiguration and call it only when !isPartial inside validateSourceModel so that validateModelConfiguration covers only config parsing while validateCompleteModelStructure and validateNoLoggerOrPrintStatements remain full-upload only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 148-171: The comment for validateSourceModel is out-of-sync with
behavior: validateModelConfiguration (including warmup enforcement) is invoked
for both partial and full uploads though the comment implies warmup is
full-only; update the function comment to state that model configuration
(config.pbtxt and warmup enforcement) is validated for both partial and full
uploads, referencing validateModelConfiguration and the isPartial flag, or
alternatively move the warmup-specific check out of validateModelConfiguration
and call it only when !isPartial inside validateSourceModel so that
validateModelConfiguration covers only config parsing while
validateCompleteModelStructure and validateNoLoggerOrPrintStatements remain
full-upload only.
|
@shubhangi-raj-meesho take pull from develop and push again |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
horizon/internal/predator/handler/predator_upload.go (2)
562-572:validateNoLoggerOrPrintStatementsre-reads and re-parsesconfig.pbtxtthatvalidateModelConfigurationalready consumed.By the time
validateNoLoggerOrPrintStatementsis reached,validateModelConfigurationhas already successfully read and unmarshalled the same file. This is a redundant GCS round-trip (and a TOCTOU window). Consider threading the already-parsed*ModelConfiginto this function, or extracting a shared config-read step invalidateSourceModel.♻️ Sketch: pass parsed config in
-func (p *Predator) validateNoLoggerOrPrintStatements(gcsPath string) error { +func (p *Predator) validateNoLoggerOrPrintStatements(gcsPath string, modelConfig *ModelConfig) error { srcBucket, srcPath := extractGCSPath(gcsPath) if srcBucket == "" || srcPath == "" { return fmt.Errorf("invalid GCS path format: %s", gcsPath) } - // Read and parse config.pbtxt to determine backend type. - configPath := path.Join(srcPath, configFile) - configData, err := p.GcsClient.ReadFile(srcBucket, configPath) - if err != nil { - return fmt.Errorf("failed to read config.pbtxt for logger/print check: %w", err) - } - - var modelConfig ModelConfig - if err := prototext.Unmarshal(configData, &modelConfig); err != nil { - return fmt.Errorf("failed to parse config.pbtxt for logger/print check: %w", err) - } - - if !isPythonBackendModel(&modelConfig) { + if !isPythonBackendModel(modelConfig) {In
validateSourceModel, thread the config through after parsing it once invalidateModelConfiguration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload.go` around lines 562 - 572, validateNoLoggerOrPrintStatements currently re-reads and re-parses config.pbtxt after validateModelConfiguration already unmarshalled it; change the call sites so validateModelConfiguration or validateSourceModel parses config.pbtxt once into a ModelConfig and pass that parsed *ModelConfig into validateNoLoggerOrPrintStatements (or extract a shared read step in validateSourceModel) instead of calling GcsClient.ReadFile/prototext.Unmarshal again, updating the function signature of validateNoLoggerOrPrintStatements and all callers accordingly to remove the redundant GCS round-trip and TOCTOU window.
685-687: Move compiled regexes to package-level vars — they're rebuilt on every.pyfile scan.
funcDefPattern,loggerPattern, andprintPatternare recompiled on each call tohasPythonLoggerOrPrintStatements, which is invoked once per.pyfile. Since these patterns are immutable, they should be declared as package-level variables.♻️ Proposed refactor
Add these alongside the other
varblocks inpredator_constants.goor at the top of this file:+var ( + funcDefPattern = regexp.MustCompile(`^def\s+(\w+)\s*\(`) + loggerPattern = regexp.MustCompile(`(?i)logger\.(info|debug)\s*\(`) + printPattern = regexp.MustCompile(`\bprint\s*\(`) +)Then remove the three
regexp.MustCompilecalls insidehasPythonLoggerOrPrintStatements:- funcDefPattern := regexp.MustCompile(`^def\s+(\w+)\s*\(`) - loggerPattern := regexp.MustCompile(`(?i)logger\.(info|debug)\s*\(`) - printPattern := regexp.MustCompile(`\bprint\s*\(`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@horizon/internal/predator/handler/predator_upload.go` around lines 685 - 687, Declare funcDefPattern, loggerPattern, and printPattern as package-level variables (e.g., in the existing var block in predator_constants.go or at the top of predator_upload.go) using regexp.MustCompile so they are compiled once at init; then remove the regexp.MustCompile calls inside hasPythonLoggerOrPrintStatements and update that function to reference the package-level funcDefPattern, loggerPattern, and printPattern variables directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 683-729: The function hasPythonLoggerOrPrintStatements is
producing false positives because it doesn't track triple-quoted string spans;
update the main line loop to maintain boolean state flags (e.g., inTripleDouble,
inTripleSingle) that toggle when encountering unescaped triple double-quotes
(""") or triple single-quotes (''') across lines, skip
loggerPattern/printPattern checks while those flags are true, and ensure
transitions respect existing single/double-quote handling in stripInlineComment;
integrate this alongside the existing functionStack/funcScopeEntry logic so
loggerPattern and printPattern only append to details when not inside a
triple-quoted string and not inside allowed functions.
---
Nitpick comments:
In `@horizon/internal/predator/handler/predator_upload.go`:
- Around line 562-572: validateNoLoggerOrPrintStatements currently re-reads and
re-parses config.pbtxt after validateModelConfiguration already unmarshalled it;
change the call sites so validateModelConfiguration or validateSourceModel
parses config.pbtxt once into a ModelConfig and pass that parsed *ModelConfig
into validateNoLoggerOrPrintStatements (or extract a shared read step in
validateSourceModel) instead of calling GcsClient.ReadFile/prototext.Unmarshal
again, updating the function signature of validateNoLoggerOrPrintStatements and
all callers accordingly to remove the redundant GCS round-trip and TOCTOU
window.
- Around line 685-687: Declare funcDefPattern, loggerPattern, and printPattern
as package-level variables (e.g., in the existing var block in
predator_constants.go or at the top of predator_upload.go) using
regexp.MustCompile so they are compiled once at init; then remove the
regexp.MustCompile calls inside hasPythonLoggerOrPrintStatements and update that
function to reference the package-level funcDefPattern, loggerPattern, and
printPattern variables directly.
| lines := strings.Split(string(content), "\n") | ||
|
|
||
| funcDefPattern := regexp.MustCompile(`^def\s+(\w+)\s*\(`) | ||
| loggerPattern := regexp.MustCompile(`(?i)logger\.(info|debug)\s*\(`) | ||
| printPattern := regexp.MustCompile(`\bprint\s*\(`) | ||
|
|
||
| var functionStack []funcScopeEntry | ||
|
|
||
| for lineNum, line := range lines { | ||
| strippedLine := strings.TrimSpace(line) | ||
|
|
||
| if strippedLine == "" { | ||
| continue | ||
| } | ||
|
|
||
| if strings.HasPrefix(strippedLine, "#") { | ||
| continue | ||
| } | ||
|
|
||
| lineIndent := len(line) - len(strings.TrimLeft(line, " \t")) | ||
|
|
||
| for len(functionStack) > 0 && functionStack[len(functionStack)-1].indent >= lineIndent { | ||
| functionStack = functionStack[:len(functionStack)-1] | ||
| } | ||
|
|
||
| if funcMatch := funcDefPattern.FindStringSubmatch(strippedLine); len(funcMatch) > 1 { | ||
| functionStack = append(functionStack, funcScopeEntry{name: funcMatch[1], indent: lineIndent}) | ||
| continue | ||
| } | ||
|
|
||
| if isInsideAllowedFunction(functionStack) { | ||
| continue | ||
| } | ||
|
|
||
| codePortion := stripInlineComment(line) | ||
|
|
||
| if loggerPattern.MatchString(codePortion) { | ||
| details = append(details, fmt.Sprintf("line %d: %s", lineNum+1, strippedLine)) | ||
| found = true | ||
| } else if printPattern.MatchString(codePortion) { | ||
| details = append(details, fmt.Sprintf("line %d: %s", lineNum+1, strippedLine)) | ||
| found = true | ||
| } | ||
| } | ||
|
|
||
| return found, details | ||
| } |
There was a problem hiding this comment.
Triple-quoted multi-line strings cause false positives.
hasPythonLoggerOrPrintStatements has no state machine for """ / ''' spans. A logger call that appears only inside a docstring (outside initialize/finalize) will be falsely flagged:
def execute(self, requests):
"""
Example: logger.info("request received") # ← flagged as violation
"""The line logger.info("request received") is not empty, does not start with #, and matches loggerPattern after stripInlineComment, so it gets reported.
Consider tracking inTripleDouble/inTripleSingle state similarly to how stripInlineComment tracks single/double quotes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@horizon/internal/predator/handler/predator_upload.go` around lines 683 - 729,
The function hasPythonLoggerOrPrintStatements is producing false positives
because it doesn't track triple-quoted string spans; update the main line loop
to maintain boolean state flags (e.g., inTripleDouble, inTripleSingle) that
toggle when encountering unescaped triple double-quotes (""") or triple
single-quotes (''') across lines, skip loggerPattern/printPattern checks while
those flags are true, and ensure transitions respect existing
single/double-quote handling in stripInlineComment; integrate this alongside the
existing functionStack/funcScopeEntry logic so loggerPattern and printPattern
only append to details when not inside a triple-quoted string and not inside
allowed functions.
Context
Added automated model checks to the model upload workflow, including validation for logger and print statements in Python backend models, and ensuring warmup configuration is present in
config.pbtxt.Describe your changes
5 files changed across
horizon/internal/predator/handler/andhorizon/internal/externalcall/.1. Warmup configuration validation (
predator_upload.go)validateModelConfigurationto check that non-ensemble models define at least onemodel_warmupentry inconfig.pbtxt."ensemble"or presence ofensemble_scheduling) are excluded from this check because they orchestrate sub-models and don't hold weights that need warming up.isEnsembleModelthat detects ensemble models via backend, platform, orensemble_schedulingfields.2. Logger/print statement validation for Python backend models (
predator_upload.go)validateNoLoggerOrPrintStatements, invoked during full uploads only (partial uploads only syncconfig.pbtxtand don't touch Python files):config.pbtxtto determine the backend type."python", enumerates all.pyfiles in the model source directory using the newListFilesWithSuffixGCS method..pyfile, scans for uncommentedlogger.info(),logger.debug(), andprint()calls that are outsideinitialize()andfinalize()functions (logging in those lifecycle methods is acceptable since they only run during model load/unload, not inference).buildBalancedViolationSummary.isPythonBackendModeland scanning functionhasPythonLoggerOrPrintStatementswhich uses indentation-based scope tracking to determine which Python function each line belongs to.3. New GCS utility method (
gcs_client.go)ListFilesWithSuffixtoGCSClientInterfaceand its implementation onGCSClient.FindFileWithSuffix(which returns only the first match), this method collects all object paths matching the given suffix. Directory markers are skipped.4. New types and constants (
model.go,predator_constants.go)fileViolationInfostruct to group per-file violation details for balanced error reporting.errWarmupConfigMissing,errLoggerOrPrintStatementsFound, backend identifiers (pythonBackend,ensembleBackend,ensemblePlatform), allowed function names (initialize,finalize), file suffix (.py), and display cap (maxDisplayedViolationsForPythonModel = 5).5. Minor formatting fixes (
predator_constants.go,model.go)fieldDiscoveryConfigID,errNoOutputDefinitions, andServiceDeployableIDstruct tag to match surrounding constant/field alignment.6. Unit tests (
predator_upload_test.go)Testing
model_warmupinconfig.pbtxt- upload was correctly rejected with theerrWarmupConfigMissingerror. Uploading with warmup config present succeeded. Ensemble models correctly bypassed the warmup check.logger.info()andprint()statements outsideinitialize()/finalize()- upload was rejected with a clear violation summary including file names and line numbers. Verified that commented-out statements are ignored. Verified that non-Python backend models skip this check entirely.config.pbtxt.Monitoring
Info-level message on entry and on pass/skip, and aWarn-level message when violations are detected. These are visible in the existing Horizon service logs.Rollback plan
validateSourceModelflow and do not modify any data, schemas, or external state.Checklist before requesting a review
📂 Modules Affected
horizon(Real-time systems / networking)online-feature-store(Feature serving infra)trufflebox-ui(Admin panel / UI)infra(Docker, CI/CD, GCP/AWS setup)docs(Documentation updates)___________✅ Type of Change
___________Summary by CodeRabbit
New Features
Tests