Init: KDTree by Engelsgeduld · Pull Request #3 · Engelsgeduld/Spbu-ML-Practice

Engelsgeduld · 2025-03-11T20:34:05Z

No description provided.

maybenotilya

А ноутбуки где...

maybenotilya · 2025-05-01T14:52:22Z

+        self.id = 1
+        self.size = size
+        self.heap: list[tuple[float, int, Optional[PointType]]] = [(-np.inf, 1, None)]
+        heapq.heapify(self.heap)


А зачем делать heapify для кучи из одного элемента?

maybenotilya · 2025-05-01T14:53:54Z

+
+class Heap:
+    def __init__(self, size: int):
+        self.id = 1


Не совсем понимаю предназначение id

maybenotilya · 2025-05-01T14:54:22Z

+class Heap:
+    def __init__(self, size: int):
+        self.id = 1
+        self.size = size


Ну скорее capacity

maybenotilya · 2025-05-01T15:00:13Z

+            raise ValueError("Leaf size must be positive")
+
+        valid_points = self._validate_points(points)
+        self.dim: int = valid_points.shape[1]


Наверное для удобства стоит в конструкторе объявить через None

maybenotilya · 2025-05-01T15:03:10Z

+
+
+class AbstractScaler(metaclass=ABCMeta):
+    def fit(self, data: np._typing.NDArray) -> None:


В декоратор @abstractmethod стоило бы обернуть

maybenotilya · 2025-05-01T15:05:17Z

+            raise ValueError("Features and targets must be same lenght")
+        self.model = KDTree(features, self.leaf_size, self.metric)
+        self.classifier = dict((tuple(pair[0]), pair[1]) for pair in zip(features.tolist(), targets.tolist()))
+        self.targets = targets


Не понимаю зачем хранить таргеты отдельно, если они уже есть в self.classifier

maybenotilya · 2025-05-01T15:09:20Z

+            if point in self.classifier:
+                probability.append(
+                    (
+                        np.unique(self.targets),
+                        (self.classifier[point] == np.unique(self.targets)).astype(int),
+                    )
+                )


Слабо понимаю что тут вообще происходит, но кажется это некорректно, как минимум потому что могут быть две одинаковые точки с разными лейблами

maybenotilya · 2025-05-01T15:12:22Z

+                    )
+                )
+            else:
+                result = self.model.query([point], self.k)


У тебя model.query принимает много точек сразу, почему бы их всех туда не передать?

maybenotilya · 2025-05-01T15:16:55Z

+                result = self.model.query([point], self.k)
+                target_result = np.array([self.classifier[tuple(neighbors.tolist())] for neighbors in result[0]])
+                counts = np.array([(target_result == val).sum() for val in np.unique(self.targets)])
+                probability.append((self.targets, counts / len(result[0])))


Кажется сохранять в каждом предикте таргеты это оверхед по памяти, у тебя они всё равно используются только в классе

А еще почему выше np.unique(self.targets), а тут просто self.targets

maybenotilya · 2025-05-01T15:20:40Z

+    def transform(self, data: np._typing.NDArray) -> np._typing.NDArray:
+        if self.median is None or self.iqr is None:
+            raise ValueError("Scaler unfitted")
+        return (data - self.median) / self.iqr


self.iqr может быть равным нулю

Engelsgeduld added 2 commits March 11, 2025 23:31

Init: KDTree, KNNClassifier, Metrics and Scalers

c7ede5e

Fix: remove print debuging

518718f

Engelsgeduld changed the title ~~Kd tree~~ Init: KDTree Mar 11, 2025

maybenotilya reviewed May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Init: KDTree#3

Init: KDTree#3
Engelsgeduld wants to merge 2 commits intomainfrom
KD-Tree

Engelsgeduld commented Mar 11, 2025

Uh oh!

maybenotilya left a comment

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

maybenotilya May 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class AbstractScaler(metaclass=ABCMeta):
		def fit(self, data: np._typing.NDArray) -> None:

Conversation

Engelsgeduld commented Mar 11, 2025

Uh oh!

maybenotilya left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants