Skip to content

Conversation

@alexmanle
Copy link

Summary

Adds a simple interface to override agents hyperparameters.

Test Plan

Tested using AIConfigurator workload and GA agent.
Test scenario

Example TOML:

[agent_config]
population_size = 0
n_offsprings = 0
crossover_prob = 0.0
mutation_prob = 0.0
random_seed = 0

New terminal output:

2026-01-27 13:05:11,633 - INFO - Applying agent config overrides for 'ga': {'n_offsprings': 0, 'crossover_prob': 0.0, 'mutation_prob': 0.0, 'random_seed': 0}

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Added Pydantic agent configuration models, extended TestDefinition with an optional agent_config field, and implemented runtime validation of agent overrides in CLI handlers that validates per-agent config types and conditionally supplies validated kwargs to agent constructors.

Changes

Cohort / File(s) Summary
Agent configuration models
src/cloudai/models/agent_config.py
New Pydantic module introducing AgentConfig (base, extra="forbid") and specialized models: GeneticAlgorithmConfig, BayesianOptimizationConfig, MultiArmedBanditConfig with typed, constrained optional fields and metadata.
Workload model update
src/cloudai/models/workload.py
Added agent_config: Optional[dict[str, Any]] = None to TestDefinition to carry agent override data.
Runtime validation & handler changes
src/cloudai/cli/handlers.py
Added validate_agent_overrides(agent_type, agent_config) -> dict[str, Any], imported ValidationError and agent config models, and updated agent construction flow to validate overrides per-agent type, log field-level ValidationError details, skip invalid overrides, and pass validated kwargs when present.
Miscellaneous
manifest_file, requirements.txt
Updated manifest/requirements to reflect new module/dependencies (lines added).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I nibble on configs, tidy and neat,
Seeds for GA, BO, and arms that compete,
Pydantic keeps rules, no surprises today,
Validated overrides guide each agent's play,
Hop, ship, and repeat — carrots all the way! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Agent params' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changeset. Use a more specific title that describes the main feature, such as 'Add agent hyperparameter override configuration' or 'Support agent config overrides in DSE jobs'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the purpose, implementation, and testing of the agent hyperparameter override feature with concrete examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

This PR adds a clean interface for overriding agent hyperparameters through TOML configuration files. The implementation introduces Pydantic models (GeneticAlgorithmConfig, BayesianOptimizationConfig, MultiArmedBanditConfig) to validate agent-specific parameters with proper constraints, and integrates validation into the DSE job handler with helpful error messages.

Key changes:

  • Added agent_config field to TestDefinition model to store hyperparameter overrides
  • Created typed config classes with validation constraints (e.g., population_size >= 2, probability ranges 0.0-1.0)
  • Implemented validate_agent_overrides() function that validates configs and logs applied overrides
  • Added proper error handling with detailed validation messages when configs are invalid

Notable behaviors:

  • Unknown agent types (like grid_search) with agent_config silently ignore the config with only debug-level logging
  • Empty override dicts are correctly handled to avoid passing kwargs to agents that don't accept them
  • The conditional logic if agent_overrides else relies on empty dict being falsy, which works but could be more explicit

Confidence Score: 4/5

  • This PR is safe to merge with minor style improvements recommended
  • The implementation is well-structured with proper validation and error handling. Pydantic models provide type safety and constraint validation. The integration is straightforward and the previous thread's concerns about GridSearchAgent are addressed by the empty dict check. Minor points for improvement include making the conditional logic more explicit and improving user feedback for ignored configs.
  • No files require special attention - all changes are straightforward with good validation

Important Files Changed

Filename Overview
src/cloudai/models/agent_config.py New file defining Pydantic config models for GA, BO, and MAB agent hyperparameter validation
src/cloudai/models/workload.py Added agent_config field to TestDefinition to store agent hyperparameter overrides
src/cloudai/cli/handlers.py Added validate_agent_overrides function and validation logic to apply agent config overrides with error handling

Sequence Diagram

sequenceDiagram
    participant User
    participant Handler as handle_dse_job
    participant Validator as validate_agent_overrides
    participant ConfigClass as GeneticAlgorithmConfig/<br/>BayesianOptimizationConfig/<br/>MultiArmedBanditConfig
    participant Registry
    participant Agent as GA/BO/MAB/GridSearch Agent
    participant Env as CloudAIGymEnv

    User->>Handler: Run DSE job with agent_config in TOML
    Handler->>Registry: Get agent_class for agent_type
    Registry-->>Handler: Return agent class
    Handler->>Env: Create CloudAIGymEnv(test_run, runner)
    Env-->>Handler: Environment instance
    Handler->>Validator: validate_agent_overrides(agent_type, agent_config)
    alt agent_config is None/empty
        Validator-->>Handler: Return {}
    else agent_type not in map (e.g., grid_search)
        Validator->>Validator: Log debug message
        Validator-->>Handler: Return {}
    else agent_type in map (ga/bo/mab)
        Validator->>ConfigClass: model_validate(agent_config)
        ConfigClass->>ConfigClass: Validate constraints (ge, le, etc.)
        alt Validation fails
            ConfigClass-->>Validator: Raise ValidationError
            Validator-->>Handler: Propagate ValidationError
            Handler->>Handler: Log detailed error messages
            Handler-->>User: Return error code, skip agent
        else Validation succeeds
            ConfigClass-->>Validator: Validated config object
            Validator->>Validator: model_dump(exclude_none=True)
            Validator->>Validator: Log info with overrides
            Validator-->>Handler: Return agent_kwargs dict
        end
    end
    alt agent_overrides is empty dict
        Handler->>Agent: agent_class(env)
    else agent_overrides has values
        Handler->>Agent: agent_class(env, **agent_overrides)
    end
    Agent-->>Handler: Agent instance with configured parameters
    Handler->>Agent: Run agent steps (select_action, update_policy)

Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile


config_class = config_class_map.get(agent_type)
if not config_class:
logging.debug(f"No config validation available for agent type '{agent_type}', using defaults.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If agent_type doesn't match ga/bo/mab, function silently returns empty dict. Users won't know if they made a typo in their config.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/cloudai/cli/handlers.py`:
- Around line 198-201: The code silently drops a user-supplied agent_config when
no validator exists; update the branch that checks config_class =
config_class_map.get(agent_type) to emit a logging.warning (not debug) when an
agent_config was provided and will be ignored, referencing agent_type and
agent_config in the message so callers understand overrides were ignored; keep
returning {} but ensure the warning clearly states the agent_type and that
agent_config will be ignored (optionally include a truncated/summary of
agent_config) to aid user visibility.

In `@src/cloudai/models/agent_config.py`:
- Around line 51-54: The algorithm Field currently accepts any string which
risks invalid runtime values; restrict and validate it by replacing the loose
Optional[str] with a strict set of allowed values (use a Python Enum or
typing.Literal for "ucb1", "ts" (thompson_sampling), "epsilon_greedy",
"softmax", "random") or add a Pydantic validator on the algorithm field in the
AgentConfig class to raise a clear validation error when an unsupported
algorithm is provided, and update the Field description to match the enforced
choices.
- Around line 42-45: The botorch_num_trials field currently allows any integer
but should only accept -1 or integers >= 1 (matching the semantic described and
the sibling sobol_num_trials). Add a Pydantic validator for botorch_num_trials
on the AgentConfig model (e.g., a method named validate_botorch_num_trials
decorated with `@validator`("botorch_num_trials")) that returns the value if it's
None or equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0
or < -1).

Comment on lines +198 to +201
config_class = config_class_map.get(agent_type)
if not config_class:
logging.debug(f"No config validation available for agent type '{agent_type}', using defaults.")
return {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Warn when overrides are ignored for unsupported agent types.
Right now a user-provided agent_config is silently dropped (debug log only). Consider elevating to a warning to avoid confusion.

🛠 Proposed change
-    if not config_class:
-        logging.debug(f"No config validation available for agent type '{agent_type}', using defaults.")
-        return {}
+    if not config_class:
+        logging.warning(
+            f"agent_config provided for agent type '{agent_type}', but no validation model is registered; "
+            "overrides are ignored."
+        )
+        return {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
config_class = config_class_map.get(agent_type)
if not config_class:
logging.debug(f"No config validation available for agent type '{agent_type}', using defaults.")
return {}
config_class = config_class_map.get(agent_type)
if not config_class:
logging.warning(
f"agent_config provided for agent type '{agent_type}', but no validation model is registered; "
"overrides are ignored."
)
return {}
🤖 Prompt for AI Agents
In `@src/cloudai/cli/handlers.py` around lines 198 - 201, The code silently drops
a user-supplied agent_config when no validator exists; update the branch that
checks config_class = config_class_map.get(agent_type) to emit a logging.warning
(not debug) when an agent_config was provided and will be ignored, referencing
agent_type and agent_config in the message so callers understand overrides were
ignored; keep returning {} but ensure the warning clearly states the agent_type
and that agent_config will be ignored (optionally include a truncated/summary of
agent_config) to aid user visibility.

Comment on lines +42 to +45
sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
botorch_num_trials: Optional[int] = Field(
default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
)
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the actual file structure and find agent_config.py
fd -t f "agent_config.py" | head -20

Repository: NVIDIA/cloudai

Length of output: 92


🏁 Script executed:

# Read the agent_config.py file to see current state
cat -n src/cloudai/models/agent_config.py

Repository: NVIDIA/cloudai

Length of output: 3888


🏁 Script executed:

# Search for botorch_num_trials usage throughout the codebase
rg "botorch_num_trials" -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 499


🏁 Script executed:

# Search for files that use BayesianOptimizationConfig or botorch_num_trials
rg "BayesianOptimizationConfig|botorch_num_trials" -t py --files-with-matches

Repository: NVIDIA/cloudai

Length of output: 120


🏁 Script executed:

# Look for the BO agent implementation
fd -t f ".*agent.*" -path "*/bayesian*" -o -path "**/optimization*" | grep -i agent

Repository: NVIDIA/cloudai

Length of output: 226


🏁 Script executed:

# Broader search for agent implementations
fd -t f "agent" -type f

Repository: NVIDIA/cloudai

Length of output: 228


🏁 Script executed:

# Search more broadly for agent implementations
fd -e py "agent" | head -30

Repository: NVIDIA/cloudai

Length of output: 152


🏁 Script executed:

# Also search for where BayesianOptimizationConfig is instantiated
rg "BayesianOptimizationConfig" -B 3 -A 10 -t py

Repository: NVIDIA/cloudai

Length of output: 2822


🏁 Script executed:

# Search for botorch usage in the codebase
rg "botorch" -i -B 3 -A 3

Repository: NVIDIA/cloudai

Length of output: 673


🏁 Script executed:

# Find the base_agent.py file and search for BO implementation
cat src/cloudai/configurator/base_agent.py | head -100

Repository: NVIDIA/cloudai

Length of output: 2224


🏁 Script executed:

# Search for where botorch_num_trials is actually used in the agent
rg "botorch_num_trials" -A 5 -B 5 src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 40


🏁 Script executed:

# Look for any BO agent or related files
fd -e py | xargs rg -l "class.*Bayesian|botorch_num_trials"

Repository: NVIDIA/cloudai

Length of output: 92


🏁 Script executed:

# Search for actual agent implementations/classes
rg "class.*Agent.*:" -A 5 | head -100

Repository: NVIDIA/cloudai

Length of output: 4155


🏁 Script executed:

# Look at test_agents.py to see how these configs are tested
cat tests/test_agents.py

Repository: NVIDIA/cloudai

Length of output: 2749


🏁 Script executed:

# Check if there are any actual BO agent implementations elsewhere
fd -e py -x rg -l "num_trials" {} \;

Repository: NVIDIA/cloudai

Length of output: 94


🏁 Script executed:

# Look for any BO agent implementations or references
fd -e py | xargs rg -l "BayesianOptimization|BoTorch" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 120


🏁 Script executed:

# Check if there's a bayesian agent file
fd -e py -path "*/agent*" -o -path "*/bayesian*"

Repository: NVIDIA/cloudai

Length of output: 226


🏁 Script executed:

# Search for where the agent configs are actually used
rg "botorch_num_trials|sobol_num_trials" -B 5 -A 5

Repository: NVIDIA/cloudai

Length of output: 905


🏁 Script executed:

# Check what agent types are actually implemented
ls -la src/cloudai/configurator/

Repository: NVIDIA/cloudai

Length of output: 559


🏁 Script executed:

# Check if BO agent exists or is planned
fd "\.py$" | xargs rg -l "class.*Bayesian" 2>/dev/null

Repository: NVIDIA/cloudai

Length of output: 92


🏁 Script executed:

# Look at handlers.py more carefully to see how config is used
cat src/cloudai/cli/handlers.py | grep -A 20 "BayesianOptimizationConfig"

Repository: NVIDIA/cloudai

Length of output: 1467


🏁 Script executed:

# Check if there's external documentation or plugin system for agents
rg "entry_points\|entrypoint" -A 5 -B 5

Repository: NVIDIA/cloudai

Length of output: 40


🏁 Script executed:

# Search for any test or documentation mentioning BO agent
rg "bayesian|bo agent" -i -B 2 -A 2

Repository: NVIDIA/cloudai

Length of output: 1069


Add validation constraint to botorch_num_trials.
The field currently accepts any integer, but the description specifies "(-1 for unlimited until max_steps)", implying only -1 or ≥1 are valid. Without validation, invalid values like 0 or -2 pass Pydantic validation. This is inconsistent with the sibling field sobol_num_trials (line 42), which has ge=1 constraint.

🛠 Proposed fix
-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, field_validator
@@
 class BayesianOptimizationConfig(AgentConfig):
     """Configuration overrides for Bayesian Optimization agent."""
@@
     botorch_num_trials: Optional[int] = Field(
         default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
     )
+
+    `@field_validator`("botorch_num_trials")
+    `@classmethod`
+    def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:
+        if v is None:
+            return v
+        if v != -1 and v < 1:
+            raise ValueError("botorch_num_trials must be -1 or >= 1")
+        return v
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
botorch_num_trials: Optional[int] = Field(
default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
)
sobol_num_trials: Optional[int] = Field(default=None, ge=1, description="Number of SOBOL initialization trials")
botorch_num_trials: Optional[int] = Field(
default=None, description="Number of BoTorch trials (-1 for unlimited until max_steps)"
)
`@field_validator`("botorch_num_trials")
`@classmethod`
def _validate_botorch_num_trials(cls, v: Optional[int]) -> Optional[int]:
if v is None:
return v
if v != -1 and v < 1:
raise ValueError("botorch_num_trials must be -1 or >= 1")
return v
🤖 Prompt for AI Agents
In `@src/cloudai/models/agent_config.py` around lines 42 - 45, The
botorch_num_trials field currently allows any integer but should only accept -1
or integers >= 1 (matching the semantic described and the sibling
sobol_num_trials). Add a Pydantic validator for botorch_num_trials on the
AgentConfig model (e.g., a method named validate_botorch_num_trials decorated
with `@validator`("botorch_num_trials")) that returns the value if it's None or
equals -1 or is >= 1, and raises a ValueError for other values (e.g., 0 or <
-1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexmanle would be good to consider this. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would be good to test it with BO agent w/ AIConfigurator along with GA.

Comment on lines +51 to +54
algorithm: Optional[str] = Field(
default=None,
description="MAB algorithm: ucb1, ts (thompson_sampling), epsilon_greedy, softmax, or random",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Validate algorithm against supported values (or clarify extensibility).
Description enumerates allowed algorithms, but any string is accepted; this can lead to runtime errors or silent fallback.

♻️ Suggested tightening (if only these values are supported)
-from typing import Any, Optional
+from typing import Any, Optional, Literal
@@
-    algorithm: Optional[str] = Field(
+    algorithm: Optional[Literal["ucb1", "ts", "thompson_sampling", "epsilon_greedy", "softmax", "random"]] = Field(
         default=None,
         description="MAB algorithm: ucb1, ts (thompson_sampling), epsilon_greedy, softmax, or random",
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
algorithm: Optional[str] = Field(
default=None,
description="MAB algorithm: ucb1, ts (thompson_sampling), epsilon_greedy, softmax, or random",
)
from typing import Any, Optional, Literal
...
algorithm: Optional[Literal["ucb1", "ts", "thompson_sampling", "epsilon_greedy", "softmax", "random"]] = Field(
default=None,
description="MAB algorithm: ucb1, ts (thompson_sampling), epsilon_greedy, softmax, or random",
)
🤖 Prompt for AI Agents
In `@src/cloudai/models/agent_config.py` around lines 51 - 54, The algorithm Field
currently accepts any string which risks invalid runtime values; restrict and
validate it by replacing the loose Optional[str] with a strict set of allowed
values (use a Python Enum or typing.Literal for "ucb1", "ts"
(thompson_sampling), "epsilon_greedy", "softmax", "random") or add a Pydantic
validator on the algorithm field in the AgentConfig class to raise a clear
validation error when an unsupported algorithm is provided, and update the Field
description to match the enforced choices.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

err = 1
continue

agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional passes kwargs only when non-empty, but empty dict is falsy. If agent_overrides = {}, the condition evaluates False and takes the else branch. Consider using is not None or explicit length check for clarity:

Suggested change
agent = agent_class(env, **agent_overrides) if agent_overrides else agent_class(env)
agent = agent_class(env, **agent_overrides) if agent_overrides is not None else agent_class(env)

However, given validate_agent_overrides always returns a dict, the current logic works but is subtle.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +200 to +201
logging.debug(f"No config validation available for agent type '{agent_type}', using defaults.")
return {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent fallback for unknown agent types could hide config typos. Users won't get any indication their config was ignored except a debug log. Consider logging at warning level when agent_config is provided but ignored:

Suggested change
logging.debug(f"No config validation available for agent type '{agent_type}', using defaults.")
return {}
if not config_class:
if agent_config:
logging.warning(f"Agent config provided but no validation available for agent type '{agent_type}'. Config will be ignored. Available types: ga, bo, mab")
return {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants