Skip to content

Parameter validation for kernel factories#776

Open
AdrianSosic wants to merge 8 commits intodev/gpfrom
feature/parameter_support
Open

Parameter validation for kernel factories#776
AdrianSosic wants to merge 8 commits intodev/gpfrom
feature/parameter_support

Conversation

@AdrianSosic
Copy link
Copy Markdown
Collaborator

DevPR, parent is #745

Adds a validation mechanism to ensure kernel factories only produce kernels for search spaces they are intended for.
This is achieved via a new ParameterKind flag enum that factories can use to signal which parameter types they support.

@AdrianSosic AdrianSosic self-assigned this Apr 2, 2026
@AdrianSosic AdrianSosic requested a review from Scienfitz as a code owner April 2, 2026 11:55
@AdrianSosic AdrianSosic added the new feature New functionality label Apr 2, 2026
@AdrianSosic AdrianSosic requested a review from AVHopp as a code owner April 2, 2026 11:55
@AdrianSosic AdrianSosic added the dev label Apr 2, 2026
Copilot AI review requested due to automatic review settings April 2, 2026 11:55
@AdrianSosic AdrianSosic changed the title Feature/parameter support Parameter validation for kernel factories Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a parameter-kind validation mechanism for Gaussian process kernel factories and updates kernel translation to infer dimensions from a SearchSpace, enabling factories to explicitly declare and enforce which parameter roles (e.g., task vs. regular) they support.

Changes:

  • Add ParameterKind (flag enum) + Parameter.kind and enforce supported parameter kinds in KernelFactory.
  • Introduce parameter sub-selection via parameter_selector / parameter_names, and refactor Kernel.to_gpytorch to take a SearchSpace for automatic active_dims/ard_num_dims.
  • Add a deprecation guard that raises a DeprecationError when using a custom kernel in multi-task GP contexts unless suppressed via env var.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
baybe/kernels/base.py Refactors to_gpytorch to use SearchSpace-derived dimensions; adds parameter_names to basic kernels.
baybe/parameters/enum.py Introduces ParameterKind flag enum.
baybe/parameters/base.py Adds Parameter.kind property derived from ParameterKind.
baybe/parameters/__init__.py Exposes ParameterKind in public parameters API.
baybe/parameters/selector.py Adds parameter selector protocol + concrete selectors (e.g., TypeSelector).
baybe/surrogates/gaussian_process/components/generic.py Renames factory protocol type to GPComponentFactoryProtocol and updates conversion helper typing.
baybe/surrogates/gaussian_process/components/kernel.py Adds kernel factory base class with parameter-kind validation and introduces ICMKernelFactory.
baybe/surrogates/gaussian_process/components/mean.py Switches to protocol-based mean factory typing.
baybe/surrogates/gaussian_process/components/likelihood.py Switches to protocol-based likelihood factory typing.
baybe/surrogates/gaussian_process/components/__init__.py Exposes new *Protocol factory types.
baybe/surrogates/gaussian_process/presets/baybe.py Replaces alias with explicit default kernel/task-kernel factories (incl. multitask handling).
baybe/surrogates/gaussian_process/presets/edbo.py Updates EDBO kernel factory to use parameter_names selection; adjusts likelihood factory typing.
baybe/surrogates/gaussian_process/presets/edbo_smoothed.py Same as above for smoothed EDBO.
baybe/surrogates/gaussian_process/core.py Updates GP surrogate to use protocol factories, SearchSpace-based kernel conversion, and adds multitask custom-kernel deprecation guard.
baybe/settings.py Whitelists and validates BAYBE_DISABLE_CUSTOM_KERNEL_WARNING.
tests/test_kernels.py Updates kernel assembly test to build a SearchSpace and validate inferred dims / mapping.
tests/hypothesis_strategies/kernels.py Extends kernel strategies to optionally generate parameter_names.
tests/test_deprecations.py Adds deprecation test for multitask custom-kernel behavior and env-var suppression.
CHANGELOG.md Documents new features, breaking changes, and the new deprecation.
Comments suppressed due to low confidence (2)

baybe/surrogates/gaussian_process/presets/edbo.py:76

  • effective_dims is now train_x.shape[-1], which will include task columns in multi-task settings even though this factory can be used with parameter_selector to exclude TaskParameters. Since effective_dims controls prior/initialization regime selection, it should reflect the dimensionality of the kernel’s active (selected) inputs (e.g., based on self.get_parameter_names(searchspace) / BasicKernel._get_dimensions(searchspace)), otherwise priors will shift when adding a task parameter.
    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        switching_condition = _contains_encoding(
            searchspace.discrete, _EDBO_ENCODINGS
        ) and (effective_dims >= 50)

        # low D priors
        if effective_dims < 5:
            lengthscale_prior = GammaPrior(1.2, 1.1)

baybe/surrogates/gaussian_process/presets/edbo_smoothed.py:60

  • Same as in EDBOKernelFactory: effective_dims=train_x.shape[-1] will count task dimensions even when a parameter_selector excludes them. Since the interpolated priors depend on effective_dims, compute it from the selected/active dimensions instead of the raw train_x width to keep behavior stable in multi-task setups.
    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        # Interpolate prior moments linearly between low D and high D regime.
        # The high D regime itself is the average of the EDBO OHE and Mordred regime.
        # Values outside the dimension limits will get the border value assigned.
        lengthscale_prior = GammaPrior(
            np.interp(effective_dims, _DIM_LIMITS, [1.2, 2.5]),
            np.interp(effective_dims, _DIM_LIMITS, [1.1, 0.55]),
        )
        lengthscale_initial_value = np.interp(effective_dims, _DIM_LIMITS, [0.2, 6.0])
        outputscale_prior = GammaPrior(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread baybe/kernels/base.py Outdated
Comment thread baybe/kernels/base.py Outdated
Comment thread baybe/kernels/base.py Outdated
Comment thread baybe/surrogates/gaussian_process/components/kernel.py Outdated
Comment thread baybe/parameters/enum.py
@AdrianSosic AdrianSosic mentioned this pull request Apr 2, 2026
20 tasks
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a tiny test validating that invalid parameter types are correctly rejected?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: 762b8b9

@AdrianSosic AdrianSosic force-pushed the feature/parameter_support branch from 671e81c to 602ecf3 Compare April 15, 2026 12:01
@AdrianSosic AdrianSosic force-pushed the feature/parameter_support branch from 602ecf3 to 741997b Compare April 15, 2026 12:43
@AdrianSosic AdrianSosic requested a review from Scienfitz April 16, 2026 07:11
Comment thread CHANGELOG.md
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Scienfitz: I need you to re-review the PR since I had to change the logic of the selector-mixin from the previous PR back into a (private) base class. If you look at the PR, you'll probably understand why.

Still: if you see a better way of implementation, let me know. We can also quickly discuss at PyCon, perhaps

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, one other info: I think I just realized what is a clean and very elegant way to at least get rid of the _uses_parameter_names reminder hack and enforce the subselection by contract:

Instead of hoping that the factories internally subselect the parameters from the passed searchspace, we simply let the base class create a reduced version of the searchspace and pass only that to the _make method. This would be super easily doable if we had a proper SearchSpace.drop method or similar that takes care of the reduction and would also come with no cost overhead once the discrete cartesian product is computd lazily, as is planned anyway in my upcoming PRs.

So if you agree, then we can basically consider this part of the problem solved and keep the workaround only until the method is there. What do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that you plan to not do this in the PR or wait for that feature?

Copy link
Copy Markdown
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general approach of using the ParameterKind. However, I have some questions/concerns regarding the design, specifically making the _KernelFactory internal and the use of the _use_parameter_names. Also, I think the change made in the __call__ of the ÌCMKernelFactory` is incorrect from a type perspective and needs to be fixed.

Comment thread baybe/parameters/enum.py
Comment on lines +15 to +16
Can be used to express compatibility (e.g. Gaussian process kernel factories)
with different parameter types via bitwise combination of flags.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have problems imagining the usage of this in the future. Do you plan to use this e.g. for the case of Kernels that can only be calculated for e.g. SubatanceParameters or NumericalDiscrete parameters? How would the interface work then, can you elaborate a bit on this?

base_kernel = base_kernel.to_gpytorch(searchspace)
if isinstance(task_kernel, Kernel):
task_kernel = task_kernel.to_gpytorch(searchspace)
return base_kernel * task_kernel
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this now return a ProductKernel from gpytorch instead of our own as we now potentially multiply two kernels that have been created via .to_gpytorch?

Comment thread CHANGELOG.md
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that you plan to not do this in the PR or wait for that feature?



@define
class _KernelFactory(KernelFactoryProtocol, ABC):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an internal class? As a user, I would now be very confused about how to implement or use my own Kernel or KernelFactory. Since this is internal, I would assume that this is not what I should use. I would thus assume that I simply need to take a look at the KernelFactoryProtocol and implement what is there. Since this is the GPComponentFactoryProtocol, I'd then assume that I only need to implement the corresponding __call__ method. But then, I do not get any of the validation that is now part of this _KernelFactory class. In particular, any sort of validation is then completely the responsibility of the user.

  1. Do I understand this correctly or am I missing something?
  2. Is this the intended behaviour?
  3. Is there any way that we can make this clearer/easier for the user?

To be honest, if I was to say "I just want to test BayBE with a custom kernel", I would probably be completely lost and not really know where to start since we also have no documentation around this, right?

Comment on lines +114 to +116
f"`_uses_parameter_names = True`. Subclasses that accept a "
f"parameter selector must explicitly set this flag to confirm "
f"they actually use the selected parameter names."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, technically setting the flag to True does not confirm that the selector is actually used. To be honest, I currently do not see the value of this field and respective check, given that (i) it can simply be ignored as I can just claim to use the selector by setting the field to True and (ii) you are already now thinking about a way of replacing it with a more robust approach (which I appreciate!). So do we actually need to have it in the first place right now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev new feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants