Skip to content

Fitters.py refactoring#76

Open
siwreienta wants to merge 8 commits intomainfrom
fitters-refactor
Open

Fitters.py refactoring#76
siwreienta wants to merge 8 commits intomainfrom
fitters-refactor

Conversation

@siwreienta
Copy link
Copy Markdown

No description provided.

@siwreienta siwreienta linked an issue Mar 4, 2026 that may be closed by this pull request
@siwreienta siwreienta marked this pull request as draft March 10, 2026 23:49
@siwreienta siwreienta marked this pull request as ready for review April 10, 2026 14:07
@siwreienta siwreienta self-assigned this Apr 12, 2026
Copy link
Copy Markdown
Collaborator

@LeonidElkin LeonidElkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Universal options (calculations) of the characteristic should be passed to query_method and stick out explicitly for the user as a public API instead of the current **options: Any. @Desiment will describe the use cases in more detail in the PR or personally
  • The strategy assumes responsibility for passing options to ComputationMethod. If no changes have been made by the user, we leave the defaults.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, add a docstring with details of this submodule in the top of the file. Secondly, in order not to stretch the init files, I recommend making a scheme with importing all from the files of this module. For details, see how it's done here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you need to add docstrings everywhere. I won't mention it farther

@@ -0,0 +1,70 @@
from __future__ import annotations

__author__ = "Leonid Elkin, Mikhail Mikhailov, Irina Sergeeva"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add only yourself to author

Comment on lines +116 to +117
tags : frozenset[str]
Constraint tags used for matching (e.g. ``{"continuous", "univariate"}``).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is not very telling. At least constraint_tags or smth like that

name: str
target: GenericCharacteristicName
sources: Sequence[GenericCharacteristicName]
fitter: FitterFunc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already have such type in computations.py. Move it to types.py and use in both places

Comment on lines +150 to +162
from pysatl_core.distributions.computation import ComputationMethod

if self.cacheable:
return ComputationMethod(
target=self.target,
sources=list(self.sources),
fitter=self.fitter,
)
return ComputationMethod(
target=self.target,
sources=list(self.sources),
evaluator=self.fitter, # type: ignore[arg-type]
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit incorrect. You provide FitterFunc type to evaluator. I think it's worth doing the following: if the fitter is not cacheable, then we don't name it a fitter, but an evaluator. And we return from it not the FittedComputationMethod, but the already calculated value. This will help us maintain the overall consistency of the API.

I think it's possible to rename submodule fitters to computations and handle both evaluator and fitters cases. You probably could move computation.py there

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After it's done I think the cacheable field will be gone

Comment on lines +76 to +85
raw = kwargs.pop(self.name, self.default)
try:
value = self.type(raw)
except (TypeError, ValueError) as exc:
raise TypeError(
f"Option '{self.name}': cannot convert {raw!r} to {self.type.__name__}"
) from exc
if self.validate is not None and not self.validate(value):
raise ValueError(f"Option '{self.name}': value {value!r} failed validation.")
return value
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our dialogue in an online call. The kwargs will have to leave, and we will use the descriptors to form kwargs for specific fitters and evaluators.

cacheable: bool = True
options: tuple[FitterOption, ...] = ()
tags: frozenset[str] = field(default_factory=frozenset)
priority: int = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that priority is used for smth. I think we should either remove it or change the strategy to consider it while finding the path in the graph

FittedComputationMethod[NumericArray, NumericArray]
Array-semantic ``cdf`` callable.
"""
opts = FITTER_PDF_TO_CDF_1C.resolve_options(kwargs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also should be gone after moving descriptors to strategy

Comment on lines +94 to +111
FITTER_PDF_TO_CDF_1C = FitterDescriptor(
name="pdf_to_cdf_1C",
target=CharacteristicName.CDF,
sources=[CharacteristicName.PDF],
fitter=fit_pdf_to_cdf_1C,
options=(
FitterOption(
name="limit",
type=int,
default=200,
description="Maximum number of quad subdivisions per integral.",
validate=lambda v: v > 0,
),
),
tags=frozenset({"continuous", "univariate"}),
priority=0,
description="PDF -> CDF via segment-wise scipy.integrate.quad with cumsum.",
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fill in the descriptors lazily when the strategy does not find an analytically defined characteristic in the graph and goes looking for paths in the graph, or when the user wants to get/change options in the descriptors.

This is necessary so that when working with our builtins distributions in a standard way, you do not need to keep extra information in memory.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it's worth keeping the descriptor padding and calculations in different places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fitters.py refactoring

2 participants