Conversation
…tructures into decision-tree
| """Calcute the distance between two rows.""" | ||
| dist = 0.0 | ||
| for i in range(len(row1) - 1): | ||
| dist += (row1[i] - row2[i]) ** 2 |
There was a problem hiding this comment.
You're missing out on using the power of Numpy (or pandas) here to broadcast mathematical operations. If row1 and row2 are numpy arrays, then you could just have
return sqrt(np.sum((row1 - row2)**2))There was a problem hiding this comment.
Written this way to account for the difference in length between rows. Test data is submitted without a "classification" column. Present data has such columns.
src/knn.py
Outdated
|
|
||
| def predict(self, test_data, tk=None): | ||
| """Given data, categorize the data by its k nearest neighbors.""" | ||
| if tk is None: |
src/knn.py
Outdated
| for row in self.data.iterrows(): | ||
| distances.append((row[1][-1], self._distance(row[1], test_data))) | ||
| distances.sort(key=lambda x: x[1]) | ||
| # import pdb; pdb.set_trace() |
src/knn.py
Outdated
| if my_class: | ||
| return my_class | ||
| else: | ||
| self.predict(test_data, tk - 1) |
There was a problem hiding this comment.
Confused as to why this has to be recursive
There was a problem hiding this comment.
Written for the case in which the classification is a "tie" between two classes. In that case, the classify function returns None and therefore predict is run once again with a decreased k value. This is based on my interpretation of the algorithm in the class notes. Does not mean I didn't interpret it incorrectly, though.
https://codefellows.github.io/sea-python-401d5/lectures/k_nearest_neighbors.html?highlight=nearest
No description provided.