Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@

# CHANGELOG

## 1.6.0
- major package refactor and simplification
- streamlined TopologicalEmbedding pipeline (numpy computations, removed parallel)
- fix bug (thanks to F. Hudrisier) due to wrong hash usage

## 1.5.0
- major computation gains (joblib.Parallel)
- cleared, streamlined readme with acknowledgements
Expand Down
2 changes: 0 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@
</div>
<br>


---
# TDAAD – Topological Data Analysis for Anomaly Detection

Expand Down Expand Up @@ -113,7 +112,6 @@ For more advanced usage (e.g. custom embeddings, parameter tuning), see the [exa
- `window_size` controls the time resolution — larger windows capture slower anomalies, smaller ones detect more localized changes.
- `n_centers_by_dim` controls the number of reference shapes used per homology dimension (e.g. connected components in H0, loops in H1, ...). Increasing this improves sensitivity but adds computation time.
- `tda_max_dim` sets the **maximum topological feature dimension** computed (0 = connected components, 1 = loops, 2 = voids, ...). Higher values increase runtime and memory usage.
- Internally, computations are **parallelized** using `joblib` to scale to larger datasets. Use `n_jobs` to control parallelism.
- Inputs can be `numpy.ndarray` or `pandas.DataFrame`. Column names are preserved in the output when using DataFrames.

⚙️ You can typically handle ~100 sensors and a few hundred time steps per window on a modern machine.
Expand Down
790 changes: 285 additions & 505 deletions examples/oua_tutorial.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "tdaad"
version = "1.5.0"
version = "1.6.0"
description = "Tools for anomaly detection in time series based on Topological Data Analysis"
readme = "README.md"
requires-python = ">=3.12"
Expand All @@ -21,7 +21,7 @@ dependencies = {file = ["requirements.txt"]}

[tool.setuptools.packages.find]
where = ["."] # list of folders that contain the packages (["."] by default)
include = ["tdaad","tdaad.utils"] # package names should match these glob patterns (["*"] by default)
include = ["tdaad"] # package names should match these glob patterns (["*"] by default)
exclude = [] # exclude packages matching these glob patterns (empty by default)
namespaces = false # to disable scanning PEP 420 namespaces (true by default)

Expand Down
4 changes: 1 addition & 3 deletions tdaad/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@

"""

__version__ = "1.3.1"
__version__ = "1.6.0"

__all__ = [
"anomaly_detectors",
"persistencediagram_transformer",
"topological_embedding",
"utils",
]
63 changes: 57 additions & 6 deletions tdaad/anomaly_detectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,70 @@
from sklearn.base import _fit_context, TransformerMixin
from sklearn.utils._param_validation import Interval
from sklearn.utils.validation import check_is_fitted
from sklearn.covariance import EllipticEnvelope

from tdaad.utils.remapping_functions import score_flat_fast_remapping
from tdaad.topological_embedding import TopologicalEmbedding
from tdaad.utils.local_elliptic_envelope import EllipticEnvelope


def score_flat_fast_remapping(scores, window_size, stride, padding_length=0):
"""
Remap window-level anomaly scores to a flat sequence of per-time-step scores.

Parameters
----------
scores : array-like of shape (n_windows,)
Anomaly scores for each window. Can be a pandas Series or NumPy array.

window_size : int
Size of the sliding window.

stride : int
Step size between windows.

padding_length : int, optional (default=0)
Extra length to pad the output array (typically at the end of a signal).

Returns
-------
remapped_scores : np.ndarray of shape (n_timestamps + padding_length,)
Flattened anomaly scores with per-timestep resolution. NaN values (from
positions not covered by any window) are replaced with 0.
"""
# Ensure scores is a NumPy array
if hasattr(scores, "values"):
scores = scores.values

n_windows = len(scores)

# Compute begin and end indices for each window
begins = np.arange(n_windows) * stride
ends = begins + window_size

# Output length based on last window + padding
total_length = ends[-1] + padding_length
remapped_scores = np.full(total_length, np.nan)

# Find all unique intersection points between windows
intersections = np.unique(np.concatenate((begins, ends)))

# For each interval between two intersections, find overlapping windows and sum their scores
for left, right in zip(intersections[:-1], intersections[1:]):
overlapping = (begins <= left) & (right <= ends)
if np.any(overlapping):
remapped_scores[left:right] = np.nansum(scores[overlapping])

# Replace NaNs (unscored positions) with 0
np.nan_to_num(remapped_scores, copy=False)

return remapped_scores


class TopologicalAnomalyDetector(EllipticEnvelope, TransformerMixin):
"""
Anomaly detection for multivariate time series using topological embeddings and robust covariance estimation.

This detector extracts topological features from sliding windows of time series data and
uses a robust Mahalanobis distance (via EllipticEnvelope) to score anomalies.
uses a robust Mahalanobis distance (via PandasEllipticEnvelope) to score anomalies.

Read more in the :ref:`User Guide <topological_anomaly_detection>`.

Expand Down Expand Up @@ -160,12 +212,11 @@ def score_samples(self, X, y=None):

def decision_function(self, X):
"""Return the distance to the decision boundary."""
return self.offset_ - self.score_samples(X)
return self.score_samples(X) - self.offset_

def predict(self, X):
"""Predict inliers (1) and outliers (-1) using learned threshold."""
scores = self.score_samples(X)
return np.where(scores < self.offset_, -1, 1)
return np.where(self.decision_function(X) < 0, -1, 1)

def transform(self, X):
"""Alias for score_samples. Returns anomaly scores."""
Expand Down
Loading