Parameter validation for kernel factories by AdrianSosic · Pull Request #776 · emdgroup/baybe

AdrianSosic · 2026-04-02T11:55:18Z

DevPR, parent is #745

Adds a validation mechanism to ensure kernel factories only produce kernels for search spaces they are intended for.
This is achieved via a new ParameterKind flag enum that factories can use to signal which parameter types they support.

Copilot

Pull request overview

This PR introduces a parameter-kind validation mechanism for Gaussian process kernel factories and updates kernel translation to infer dimensions from a SearchSpace, enabling factories to explicitly declare and enforce which parameter roles (e.g., task vs. regular) they support.

Changes:

Add ParameterKind (flag enum) + Parameter.kind and enforce supported parameter kinds in KernelFactory.
Introduce parameter sub-selection via parameter_selector / parameter_names, and refactor Kernel.to_gpytorch to take a SearchSpace for automatic active_dims/ard_num_dims.
Add a deprecation guard that raises a DeprecationError when using a custom kernel in multi-task GP contexts unless suppressed via env var.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`baybe/kernels/base.py`	Refactors `to_gpytorch` to use `SearchSpace`-derived dimensions; adds `parameter_names` to basic kernels.
`baybe/parameters/enum.py`	Introduces `ParameterKind` flag enum.
`baybe/parameters/base.py`	Adds `Parameter.kind` property derived from `ParameterKind`.
`baybe/parameters/__init__.py`	Exposes `ParameterKind` in public parameters API.
`baybe/parameters/selector.py`	Adds parameter selector protocol + concrete selectors (e.g., `TypeSelector`).
`baybe/surrogates/gaussian_process/components/generic.py`	Renames factory protocol type to `GPComponentFactoryProtocol` and updates conversion helper typing.
`baybe/surrogates/gaussian_process/components/kernel.py`	Adds kernel factory base class with parameter-kind validation and introduces `ICMKernelFactory`.
`baybe/surrogates/gaussian_process/components/mean.py`	Switches to protocol-based mean factory typing.
`baybe/surrogates/gaussian_process/components/likelihood.py`	Switches to protocol-based likelihood factory typing.
`baybe/surrogates/gaussian_process/components/__init__.py`	Exposes new `*Protocol` factory types.
`baybe/surrogates/gaussian_process/presets/baybe.py`	Replaces alias with explicit default kernel/task-kernel factories (incl. multitask handling).
`baybe/surrogates/gaussian_process/presets/edbo.py`	Updates EDBO kernel factory to use `parameter_names` selection; adjusts likelihood factory typing.
`baybe/surrogates/gaussian_process/presets/edbo_smoothed.py`	Same as above for smoothed EDBO.
`baybe/surrogates/gaussian_process/core.py`	Updates GP surrogate to use protocol factories, `SearchSpace`-based kernel conversion, and adds multitask custom-kernel deprecation guard.
`baybe/settings.py`	Whitelists and validates `BAYBE_DISABLE_CUSTOM_KERNEL_WARNING`.
`tests/test_kernels.py`	Updates kernel assembly test to build a `SearchSpace` and validate inferred dims / mapping.
`tests/hypothesis_strategies/kernels.py`	Extends kernel strategies to optionally generate `parameter_names`.
`tests/test_deprecations.py`	Adds deprecation test for multitask custom-kernel behavior and env-var suppression.
`CHANGELOG.md`	Documents new features, breaking changes, and the new deprecation.

Comments suppressed due to low confidence (2)

baybe/surrogates/gaussian_process/presets/edbo.py:76

effective_dims is now train_x.shape[-1], which will include task columns in multi-task settings even though this factory can be used with parameter_selector to exclude TaskParameters. Since effective_dims controls prior/initialization regime selection, it should reflect the dimensionality of the kernel’s active (selected) inputs (e.g., based on self.get_parameter_names(searchspace) / BasicKernel._get_dimensions(searchspace)), otherwise priors will shift when adding a task parameter.

    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        switching_condition = _contains_encoding(
            searchspace.discrete, _EDBO_ENCODINGS
        ) and (effective_dims >= 50)

        # low D priors
        if effective_dims < 5:
            lengthscale_prior = GammaPrior(1.2, 1.1)

baybe/surrogates/gaussian_process/presets/edbo_smoothed.py:60

Same as in EDBOKernelFactory: effective_dims=train_x.shape[-1] will count task dimensions even when a parameter_selector excludes them. Since the interpolated priors depend on effective_dims, compute it from the selected/active dimensions instead of the raw train_x width to keep behavior stable in multi-task setups.

    @override
    def _make(
        self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor
    ) -> Kernel:
        effective_dims = train_x.shape[-1]

        # Interpolate prior moments linearly between low D and high D regime.
        # The high D regime itself is the average of the EDBO OHE and Mordred regime.
        # Values outside the dimension limits will get the border value assigned.
        lengthscale_prior = GammaPrior(
            np.interp(effective_dims, _DIM_LIMITS, [1.2, 2.5]),
            np.interp(effective_dims, _DIM_LIMITS, [1.1, 0.55]),
        )
        lengthscale_initial_value = np.interp(effective_dims, _DIM_LIMITS, [0.2, 6.0])
        outputscale_prior = GammaPrior(

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Scienfitz · 2026-04-10T18:16:55Z

can we have a tiny test validating that invalid parameter types are correctly rejected?

yes: 762b8b9

The new parameter kind validation step guards us

AdrianSosic · 2026-04-16T07:13:22Z

@Scienfitz: I need you to re-review the PR since I had to change the logic of the selector-mixin from the previous PR back into a (private) base class. If you look at the PR, you'll probably understand why.

Still: if you see a better way of implementation, let me know. We can also quickly discuss at PyCon, perhaps

Hey, one other info: I think I just realized what is a clean and very elegant way to at least get rid of the _uses_parameter_names reminder hack and enforce the subselection by contract:

Instead of hoping that the factories internally subselect the parameters from the passed searchspace, we simply let the base class create a reduced version of the searchspace and pass only that to the _make method. This would be super easily doable if we had a proper SearchSpace.drop method or similar that takes care of the reduction and would also come with no cost overhead once the discrete cartesian product is computd lazily, as is planned anyway in my upcoming PRs.

So if you agree, then we can basically consider this part of the problem solved and keep the workaround only until the method is there. What do you think?

Do I understand it correctly that you plan to not do this in the PR or wait for that feature?

AVHopp

I like the general approach of using the ParameterKind. However, I have some questions/concerns regarding the design, specifically making the _KernelFactory internal and the use of the _use_parameter_names. Also, I think the change made in the __call__ of the ÌCMKernelFactory` is incorrect from a type perspective and needs to be fixed.

AVHopp · 2026-04-17T14:40:47Z

+    Can be used to express compatibility (e.g. Gaussian process kernel factories)
+    with different parameter types via bitwise combination of flags.


I have problems imagining the usage of this in the future. Do you plan to use this e.g. for the case of Kernels that can only be calculated for e.g. SubatanceParameters or NumericalDiscrete parameters? How would the interface work then, can you elaborate a bit on this?

AVHopp · 2026-04-17T14:45:02Z

+            base_kernel = base_kernel.to_gpytorch(searchspace)
+        if isinstance(task_kernel, Kernel):
+            task_kernel = task_kernel.to_gpytorch(searchspace)
+        return base_kernel * task_kernel


Doesn't this now return a ProductKernel from gpytorch instead of our own as we now potentially multiply two kernels that have been created via .to_gpytorch?

AVHopp · 2026-04-17T14:55:55Z

Do I understand it correctly that you plan to not do this in the PR or wait for that feature?

AVHopp · 2026-04-17T15:11:58Z



+@define
+class _KernelFactory(KernelFactoryProtocol, ABC):


Why is this an internal class? As a user, I would now be very confused about how to implement or use my own Kernel or KernelFactory. Since this is internal, I would assume that this is not what I should use. I would thus assume that I simply need to take a look at the KernelFactoryProtocol and implement what is there. Since this is the GPComponentFactoryProtocol, I'd then assume that I only need to implement the corresponding __call__ method. But then, I do not get any of the validation that is now part of this _KernelFactory class. In particular, any sort of validation is then completely the responsibility of the user.

Do I understand this correctly or am I missing something?

Is this the intended behaviour?

Is there any way that we can make this clearer/easier for the user?

To be honest, if I was to say "I just want to test BayBE with a custom kernel", I would probably be completely lost and not really know where to start since we also have no documentation around this, right?

AVHopp · 2026-04-17T15:20:21Z

+                f"`_uses_parameter_names = True`. Subclasses that accept a "
+                f"parameter selector must explicitly set this flag to confirm "
+                f"they actually use the selected parameter names."


Well, technically setting the flag to True does not confirm that the selector is actually used. To be honest, I currently do not see the value of this field and respective check, given that (i) it can simply be ignored as I can just claim to use the selector by setting the field to True and (ii) you are already now thinking about a way of replacing it with a more robust approach (which I appreciate!). So do we actually need to have it in the first place right now?

AdrianSosic self-assigned this Apr 2, 2026

AdrianSosic requested a review from Scienfitz as a code owner April 2, 2026 11:55

AdrianSosic added the new feature New functionality label Apr 2, 2026

AdrianSosic requested a review from AVHopp as a code owner April 2, 2026 11:55

AdrianSosic added the dev label Apr 2, 2026

Copilot AI review requested due to automatic review settings April 2, 2026 11:55

AdrianSosic changed the title ~~Feature/parameter support~~ Parameter validation for kernel factories Apr 2, 2026

Copilot started reviewing on behalf of AdrianSosic April 2, 2026 11:55 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread baybe/kernels/base.py Outdated

Comment thread baybe/kernels/base.py Outdated

Comment thread baybe/kernels/base.py Outdated

Comment thread baybe/surrogates/gaussian_process/components/kernel.py Outdated

AdrianSosic commented Apr 2, 2026

View reviewed changes

Comment thread baybe/parameters/enum.py

AdrianSosic mentioned this pull request Apr 2, 2026

Beef up GaussianProcessSurrogate #745

Draft

20 tasks

Scienfitz approved these changes Apr 10, 2026

View reviewed changes

Move _ParameterSelectorMixin to components/kernel.py

59e9697

AdrianSosic force-pushed the feature/parameter_support branch from 671e81c to 602ecf3 Compare April 15, 2026 12:01

AdrianSosic added 5 commits April 15, 2026 14:42

Turn mixin class back into ABC

bcc51bb

Add ParameterKind Flag enum and Parameter.kind property

9c6051c

Wire parameter kind validation into kernel factories via template method

447cf6a

Remove dead TaskParameter filtering in kernel factories

5845f0f

The new parameter kind validation step guards us

Update CHANGELOG.md

741997b

AdrianSosic force-pushed the feature/parameter_support branch from 602ecf3 to 741997b Compare April 15, 2026 12:43

Add parameter support test

762b8b9

AdrianSosic requested a review from Scienfitz April 16, 2026 07:11

AdrianSosic commented Apr 16, 2026

View reviewed changes

Make ICMKernelFactory gpytorch-compatible

d6f3128

AVHopp requested changes Apr 17, 2026

View reviewed changes

		Can be used to express compatibility (e.g. Gaussian process kernel factories)
		with different parameter types via bitwise combination of flags.

Conversation

AdrianSosic commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AVHopp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants