Skip to content

📣 Optimize the knn and geometric-knn causation entropy network discovery algorithms by using a KD-Tree instead of a distance matrix #6

Description

@kslote1

Feature Request

The current implementation of discover_network computes pairwise distances via a full distance matrix, which scales as O(N^2 T log T).

This becomes a major bottleneck for large datasets. I propose adding an optional kd_tree=True parameter to leverage KD-Tree–based neighbor searches for both kNN and geometric-kNN causation entropy algorithms.

Mathematical Motivation

  • kNN and geometric-kNN causation entropy rely on identifying nearest neighbors in phase space or within a radius r.

  • Using a brute-force distance matrix requires evaluating all pairwise distances even though only local neighborhoods are needed.

  • A KD-Tree (or Ball Tree) data structure can reduce nearest-neighbor queries to approximately O(N log N) in low-to-moderate dimensions.

This optimization is mathematically equivalent: the same neighborhood sets are found, but with much faster query times.

Proposed API Change

discover_network(data, method="knn", k=5, kd_tree=True, ...)

  • kd_tree=True: Use scipy.spatial.KDTree (or sklearn.neighbors.KDTree) for neighbor search.

  • kd_tree=False: Fall back to the existing distance-matrix implementation.

Benefits

Significant speedup for large N, especially when repeatedly computing neighborhoods during causation entropy estimation.

Reduced memory footprint (no need to store the full distance matrix).

Backward compatible: default behavior remains unchanged.

Next Steps

Implement KD-Tree option inside neighbor search routines for kNN and geometric-kNN.

Add unit tests ensuring identical results between kd_tree=True and kd_tree=False.

Benchmark runtime scaling on synthetic datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions