Skip to content

Init: KDTree#3

Open
Engelsgeduld wants to merge 2 commits intomainfrom
KD-Tree
Open

Init: KDTree#3
Engelsgeduld wants to merge 2 commits intomainfrom
KD-Tree

Conversation

@Engelsgeduld
Copy link
Copy Markdown
Owner

No description provided.

@Engelsgeduld Engelsgeduld changed the title Kd tree Init: KDTree Mar 11, 2025
Copy link
Copy Markdown

@maybenotilya maybenotilya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А ноутбуки где...

Comment thread src/knn/kd_tree/heap.py
self.id = 1
self.size = size
self.heap: list[tuple[float, int, Optional[PointType]]] = [(-np.inf, 1, None)]
heapq.heapify(self.heap)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А зачем делать heapify для кучи из одного элемента?

Comment thread src/knn/kd_tree/heap.py

class Heap:
def __init__(self, size: int):
self.id = 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не совсем понимаю предназначение id

Comment thread src/knn/kd_tree/heap.py
class Heap:
def __init__(self, size: int):
self.id = 1
self.size = size
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ну скорее capacity

Comment thread src/knn/kd_tree/kdtree.py
raise ValueError("Leaf size must be positive")

valid_points = self._validate_points(points)
self.dim: int = valid_points.shape[1]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Наверное для удобства стоит в конструкторе объявить через None



class AbstractScaler(metaclass=ABCMeta):
def fit(self, data: np._typing.NDArray) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В декоратор @abstractmethod стоило бы обернуть

raise ValueError("Features and targets must be same lenght")
self.model = KDTree(features, self.leaf_size, self.metric)
self.classifier = dict((tuple(pair[0]), pair[1]) for pair in zip(features.tolist(), targets.tolist()))
self.targets = targets
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не понимаю зачем хранить таргеты отдельно, если они уже есть в self.classifier

Comment on lines +30 to +36
if point in self.classifier:
probability.append(
(
np.unique(self.targets),
(self.classifier[point] == np.unique(self.targets)).astype(int),
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Слабо понимаю что тут вообще происходит, но кажется это некорректно, как минимум потому что могут быть две одинаковые точки с разными лейблами

)
)
else:
result = self.model.query([point], self.k)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

У тебя model.query принимает много точек сразу, почему бы их всех туда не передать?

result = self.model.query([point], self.k)
target_result = np.array([self.classifier[tuple(neighbors.tolist())] for neighbors in result[0]])
counts = np.array([(target_result == val).sum() for val in np.unique(self.targets)])
probability.append((self.targets, counts / len(result[0])))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Кажется сохранять в каждом предикте таргеты это оверхед по памяти, у тебя они всё равно используются только в классе

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А еще почему выше np.unique(self.targets), а тут просто self.targets

def transform(self, data: np._typing.NDArray) -> np._typing.NDArray:
if self.median is None or self.iqr is None:
raise ValueError("Scaler unfitted")
return (data - self.median) / self.iqr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.iqr может быть равным нулю

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants