Fitters.py refactoring by siwreienta · Pull Request #76 · PySATL/pysatl-core

siwreienta · 2026-03-03T22:15:21Z

No description provided.

… information, and more

LeonidElkin

Universal options (calculations) of the characteristic should be passed to query_method and stick out explicitly for the user as a public API instead of the current **options: Any. @Desiment will describe the use cases in more detail in the PR or personally
The strategy assumes responsibility for passing options to ComputationMethod. If no changes have been made by the user, we leave the defaults.

LeonidElkin · 2026-04-13T16:00:21Z

Firstly, add a docstring with details of this submodule in the top of the file. Secondly, in order not to stretch the init files, I recommend making a scheme with importing all from the files of this module. For details, see how it's done here

Actually, you need to add docstrings everywhere. I won't mention it farther

LeonidElkin · 2026-04-13T16:02:37Z

@@ -0,0 +1,70 @@
+from __future__ import annotations
+
+__author__ = "Leonid Elkin, Mikhail Mikhailov, Irina Sergeeva"


I think you can add only yourself to author

LeonidElkin · 2026-04-13T16:32:26Z

+    tags : frozenset[str]
+        Constraint tags used for matching (e.g. ``{"continuous", "univariate"}``).


Naming is not very telling. At least constraint_tags or smth like that

LeonidElkin · 2026-04-13T16:45:00Z

+    name: str
+    target: GenericCharacteristicName
+    sources: Sequence[GenericCharacteristicName]
+    fitter: FitterFunc


We've already have such type in computations.py. Move it to types.py and use in both places

LeonidElkin · 2026-04-13T17:00:16Z

+        from pysatl_core.distributions.computation import ComputationMethod
+
+        if self.cacheable:
+            return ComputationMethod(
+                target=self.target,
+                sources=list(self.sources),
+                fitter=self.fitter,
+            )
+        return ComputationMethod(
+            target=self.target,
+            sources=list(self.sources),
+            evaluator=self.fitter,  # type: ignore[arg-type]
+        )


It is a bit incorrect. You provide FitterFunc type to evaluator. I think it's worth doing the following: if the fitter is not cacheable, then we don't name it a fitter, but an evaluator. And we return from it not the FittedComputationMethod, but the already calculated value. This will help us maintain the overall consistency of the API.

I think it's possible to rename submodule fitters to computations and handle both evaluator and fitters cases. You probably could move computation.py there

After it's done I think the cacheable field will be gone

LeonidElkin · 2026-04-13T18:06:53Z

+        raw = kwargs.pop(self.name, self.default)
+        try:
+            value = self.type(raw)
+        except (TypeError, ValueError) as exc:
+            raise TypeError(
+                f"Option '{self.name}': cannot convert {raw!r} to {self.type.__name__}"
+            ) from exc
+        if self.validate is not None and not self.validate(value):
+            raise ValueError(f"Option '{self.name}': value {value!r} failed validation.")
+        return value


Based on our dialogue in an online call. The kwargs will have to leave, and we will use the descriptors to form kwargs for specific fitters and evaluators.

LeonidElkin · 2026-04-13T18:08:24Z

+    cacheable: bool = True
+    options: tuple[FitterOption, ...] = ()
+    tags: frozenset[str] = field(default_factory=frozenset)
+    priority: int = 0


It doesn't seem that priority is used for smth. I think we should either remove it or change the strategy to consider it while finding the path in the graph

LeonidElkin · 2026-04-13T18:10:23Z

+    FittedComputationMethod[NumericArray, NumericArray]
+        Array-semantic ``cdf`` callable.
+    """
+    opts = FITTER_PDF_TO_CDF_1C.resolve_options(kwargs)


This is also should be gone after moving descriptors to strategy

LeonidElkin · 2026-04-13T18:14:29Z

+FITTER_PDF_TO_CDF_1C = FitterDescriptor(
+    name="pdf_to_cdf_1C",
+    target=CharacteristicName.CDF,
+    sources=[CharacteristicName.PDF],
+    fitter=fit_pdf_to_cdf_1C,
+    options=(
+        FitterOption(
+            name="limit",
+            type=int,
+            default=200,
+            description="Maximum number of quad subdivisions per integral.",
+            validate=lambda v: v > 0,
+        ),
+    ),
+    tags=frozenset({"continuous", "univariate"}),
+    priority=0,
+    description="PDF -> CDF via segment-wise scipy.integrate.quad with cumsum.",
+)


I think we should fill in the descriptors lazily when the strategy does not find an analytically defined characteristic in the graph and goes looking for paths in the graph, or when the user wants to get/change options in the descriptors.

This is necessary so that when working with our builtins distributions in a standard way, you do not need to keep extra information in memory.

So I think it's worth keeping the descriptor padding and calculations in different places.

siwreienta added 2 commits March 4, 2026 01:08

refactor (fitters): first attempt to implement numpy

76e116c

refactor (fitters): fix errors related with incompatible type

ef79f9e

siwreienta linked an issue Mar 4, 2026 that may be closed by this pull request

Fitters.py refactoring #26

Open

siwreienta marked this pull request as draft March 10, 2026 23:49

siwreienta added 6 commits April 10, 2026 11:22

fix: merge remote-tracking branch 'origin/main' into fitters-refactor

a8349e1

refactor (fitters): re-structuring the code, adding classes with meta…

ef70d0a

… information, and more

fix (fitters): adding necessary edits for compatibility

08b202c

fix (fitters): delete old version of fitters

29c0115

test (fitters): add unit and performance tests

096b57d

test (fitters): silenced mypy warnings

bd3c353

siwreienta marked this pull request as ready for review April 10, 2026 14:07

siwreienta requested review from Desiment and LeonidElkin April 10, 2026 21:38

siwreienta self-assigned this Apr 12, 2026

LeonidElkin requested changes Apr 13, 2026

View reviewed changes

		@@ -0,0 +1,70 @@
		from __future__ import annotations

		__author__ = "Leonid Elkin, Mikhail Mikhailov, Irina Sergeeva"

		tags : frozenset[str]
		Constraint tags used for matching (e.g. ``{"continuous", "univariate"}``).

Conversation

siwreienta commented Mar 3, 2026

Uh oh!

LeonidElkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants