Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 65 additions & 101 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,35 +19,70 @@ Python methods for numerical differentiation of noisy data, including multi-obje
<img src="https://joss.theoj.org/papers/102257ee4b0142bf49bc18d7c810e9d5/status.svg"></a>
</p>

## Table of contents
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes up really valuable real estate at the top of the page, and the readme is short enough that the value of a TOC is unclear to me, so I nixed it.

- [PyNumDiff](#pynumdiff)
- [Table of contents](#table-of-contents)
- [Introduction](#introduction)
- [Structure](#structure)
- [Citation](#citation)
- [PyNumDiff python package:](#pynumdiff-python-package)
- [Optimization algorithm:](#optimization-algorithm)
- [Getting Started](#getting-started)
- [Prerequisite](#prerequisite)
- [Installing](#installing)
- [Usage](#usage)
- [Code snippets](#code-snippets)
- [Notebook examples](#notebook-examples)
- [Important notes](#important-notes)
- [Running the tests](#running-the-tests)
- [License](#license)

## Introduction

PyNumDiff is a Python package that implements various methods for computing numerical derivatives of noisy data, which
can be a critical step in developing dynamic models or designing control. There are four different families of methods
implemented in this repository: smoothing followed by finite difference calculation, local approximation with linear
models, Kalman filtering based methods and total variation regularization methods. Most of these methods have multiple
parameters involved to tune. We take a principled approach and propose a multi-objective optimization framework for
choosing parameters that minimize a loss function to balance the faithfulness and smoothness of the derivative estimate.
For more details, refer to [this paper](https://doi.org/10.1109/ACCESS.2020.3034077).
PyNumDiff is a Python package that implements various methods for computing numerical derivatives of noisy data, which can be a critical step in developing dynamic models or designing control. There are seven different families of methods implemented in this repository:

1. convolutional smoothing followed by finite difference calculation
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only listed 4 families, which was quite out of date.

2. polynomial-fit-based methods
3. iterated finite differencing
4. total variation regularization of a finite difference derivative
5. Kalman (RTS) smoothing
6. Fourier spectral with tricks
7. linear local approximation with linear model

Most of these methods have multiple parameters, so we take a principled approach and propose a multi-objective optimization framework for choosing parameters that minimize a loss function to balance the faithfulness and smoothness of the derivative estimate. For more details, refer to [this paper](https://doi.org/10.1109/ACCESS.2020.3034077).

## Installing
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installation and usage are now prioritized.


Dependencies are listed in [pyproject.toml](https://github.com/florisvb/PyNumDiff/blob/master/pyproject.toml). They include the usual suspects like `numpy` and `scipy`, but also optionally `cvxpy`.

The code is compatible with >=Python 3.10. Install from PyPI with `pip install pynumdiff`, from source with `pip install git+https://github.com/florisvb/PyNumDiff`, or from local download with `pip install .`. Call `pip install pynumdiff[advanced]` to automatically install optional dependencies from the advanced list, like [CVXPY](https://www.cvxpy.org).

## Usage

For more details, read our [Sphinx documentation](https://pynumdiff.readthedocs.io/master/). The basic pattern of all differentiation methods is:

## Structure
```python
somethingdiff(x, dt, **kwargs)
```

where `x` is data, `dt` is a step size, and various keyword arguments control the behavior. Some methods support variable step size, in which case `dt` can also receive an array of values to denote sample locations.

You can provide the parameters:
```python
from pynumdiff.submodule import method

x_hat, dxdt_hat = method(x, dt, param1=val1, param2=val2, ...)
```

Or you can find parameter by calling the multi-objective optimization algorithm from the `optimize` module:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified this section somewhat. The second two usage examples were slightly repetitive.

```python
from pynumdiff.optimize import optimize

# estimate cutoff_frequency by (a) counting the number of true peaks per second in the data or (b) look at power spectra and choose cutoff
tvgamma = np.exp(-1.6*np.log(cutoff_frequency) -0.71*np.log(dt) - 5.1) # see https://ieeexplore.ieee.org/abstract/document/9241009

params, val = optimize(somethingdiff, x, dt, tvgamma=tvgamma, # smoothness hyperparameter which defaults to None if dxdt_truth given
dxdt_truth=None, # give ground truth data if available, in which case tvgamma goes unused
search_space_updates={'param1':[vals], 'param2':[vals], ...})

print('Optimal parameters: ', params)
x_hat, dxdt_hat = somethingdiff(x, dt, **params)
```
If no `search_space_updates` is given, a default search space is used. See the top of `_optimize.py`.

The following heuristic works well for choosing `tvgamma`, where `cutoff_frequency` is the highest frequency content of the signal in your data, and `dt` is the timestep: `tvgamma=np.exp(-1.6*np.log(cutoff_frequency)-0.71*np.log(dt)-5.1)`. Larger values of `tvgamma` produce smoother derivatives. The value of `tvgamma` is largely universal across methods, making it easy to compare method results. Be aware the optimization is a fairly heavy process.

### Notebook examples

Much more extensive usage is demonstrated in Jupyter notebooks:
* Differentiation with different methods: [1_basic_tutorial.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/1_basic_tutorial.ipynb)
* Parameter Optimization with known ground truth (only for demonstration purpose): [2a_optimizing_parameters_with_dxdt_known.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/2a_optimizing_parameters_with_dxdt_known.ipynb)
* Parameter Optimization with unknown ground truth: [2b_optimizing_parameters_with_dxdt_unknown.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/2b_optimizing_parameters_with_dxdt_unknown.ipynb)
* Automatic method suggestion: [3_automatic_method_suggestion.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/3_automatic_method_suggestion.ipynb)

## Repo Structure

- `.github/workflows` contains `.yaml` that configures our GitHub Actions continuous integration (CI) runs.
- `docs/` contains `make` files and `.rst` files to govern the way `sphinx` builds documentation, either locally by navigating to this folder and calling `make html` or in the cloud by `readthedocs.io`.
Expand Down Expand Up @@ -82,7 +117,6 @@ See CITATION.cff file as well as the following references.
journal = {Journal of Open Source Software}
}


### Optimization algorithm:

@article{ParamOptimizationDerivatives2020,
Expand All @@ -93,86 +127,16 @@ See CITATION.cff file as well as the following references.
year={2020}
}

## Getting Started

### Prerequisite

PyNumDiff requires common packages like `numpy`, `scipy`, and `matplotlib`. For a full list, you can check the file [pyproject.toml](https://github.com/florisvb/PyNumDiff/blob/master/pyproject.toml)

In addition, it also requires certain additional packages for select functions, though these are not required for a successful install of PyNumDiff:
- Total Variation Regularization methods: [`cvxpy`](http://www.cvxpy.org/install/index.html)
- `pytest` for unittests

### Installing

The code is compatible with >=Python 3.5. It can be installed using pip or directly from the source code. Basic installation options include:

* From PyPI using pip: `pip install pynumdiff`.
* From source using pip git+: `pip install git+https://github.com/florisvb/PyNumDiff`
* From local source code using setup.py: Run `pip install .` from inside this directory. See below for example.

Call `pip install pynumdiff[advanced]` to automatically install [CVXPY](https://www.cvxpy.org) along with PyNumDiff. <em>Note: Some CVXPY solvers require a license, like ECOS and MOSEK. The latter offers a [free academic license](https://www.mosek.com/products/academic-licenses/).</em>

## Usage

**PyNumDiff** uses [Sphinx](http://www.sphinx-doc.org/en/master/) for code documentation, so read more details about the API usage [there](https://pynumdiff.readthedocs.io/master/).

### Code snippets

* Basic Usage: you provide the parameters
```python
from pynumdiff.submodule import method

x_hat, dxdt_hat = method(x, dt, param1=val1, param2=val2, ...)
```
* Intermediate usage: automated parameter selection through multi-objective optimization
```python
from pynumdiff.optimize import optimize

params, val = optimize(method, x, dt, search_space={'param1':[vals], 'param2':[vals], ...},
tvgamma=tvgamma, # hyperparameter, defaults to None if dxdt_truth given
dxdt_truth=None) # or give ground truth data, in which case tvgamma unused
print('Optimal parameters: ', params)
x_hat, dxdt_hat = method(x, dt, **params)
```
If no `search_space` is given, a default one is used.

* Advanced usage: automated parameter selection through multi-objective optimization using a user-defined cutoff frequency
```python
# cutoff_freq: estimate by (a) counting the number of true peaks per second in the data or (b) look at power spectra and choose cutoff
log_gamma = -1.6*np.log(cutoff_frequency) -0.71*np.log(dt) - 5.1 # see: https://ieeexplore.ieee.org/abstract/document/9241009
tvgamma = np.exp(log_gamma)

params, val = optimize(method, x, dt, search_space={'param1':[options], 'param2':[options], ...},
tvgamma=tvgamma)
print('Optimal parameters: ', params)
x_hat, dxdt_hat = method(x, dt, **params)
```

### Notebook examples

We will frequently update simple examples for demo purposes, and here are currently exisiting ones:
* Differentiation with different methods: [1_basic_tutorial.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/1_basic_tutorial.ipynb)
* Parameter Optimization with known ground truth (only for demonstration purpose): [2a_optimizing_parameters_with_dxdt_known.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/2a_optimizing_parameters_with_dxdt_known.ipynb)
* Parameter Optimization with unknown ground truth: [2b_optimizing_parameters_with_dxdt_unknown.ipynb](https://github.com/florisvb/PyNumDiff/blob/master/examples/2b_optimizing_parameters_with_dxdt_unknown.ipynb)

### Important notes

* Larger values of `tvgamma` produce smoother derivatives
* The value of `tvgamma` is largely universal across methods, making it easy to compare method results
* The optimization is not fast. Run it on subsets of your data if you have a lot of data. It will also be much faster with faster differentiation methods, like `savgoldiff` and `butterdiff`.
* The following heuristic works well for choosing `tvgamma`, where `cutoff_frequency` is the highest frequency content of the signal in your data, and `dt` is the timestep: `tvgamma=np.exp(-1.6*np.log(cutoff_frequency)-0.71*np.log(dt)-5.1)`

### Running the tests
## Running the tests

We are using GitHub Actions for continuous intergration testing.

To run tests locally, type:
Run tests locally by navigating to the repo in a terminal and calling
```bash
> pytest pynumdiff
> pytest -s
```

Add the flag `--plot` to see plots of the methods against test functions. Add the flag `--bounds` to print log error bounds (useful when changing method behavior).
Add the flag `--plot` to see plots of the methods against test functions. Add the flag `--bounds` to print $\log$ error bounds (useful when changing method behavior).

## License

Expand Down
9 changes: 7 additions & 2 deletions pynumdiff/polynomial_fit/_polynomial_fit.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ def splinediff(x, dt, params=None, options={}, degree=3, s=None, num_iterations=
scipy.interpolate.UnivariateSpline.

:param np.array[float] x: data to differentiate
:param float dt: step size
:param float or array[float] dt: This function supports variable step size. This parameter is either the constant
step size if given as a single float, or data locations if given as an array of same length as :code:`x`.
:param list params: (**deprecated**, prefer :code:`degree`, :code:`cutoff_freq`, and :code:`num_iterations`)
:param dict options: (**deprecated**, prefer :code:`num_iterations`) an empty dictionary or {'iterate': (bool)}
:param int degree: polynomial degree of the spline. A kth degree spline can be differentiated k times.
Expand All @@ -29,7 +30,11 @@ def splinediff(x, dt, params=None, options={}, degree=3, s=None, num_iterations=
if 'iterate' in options and options['iterate']:
num_iterations = params[2]

t = np.arange(len(x))*dt
if isinstance(dt, (np.ndarray, list)): # support variable step size for this function
if len(x) != len(dt): raise ValueError("If `dt` is given as array-like, must have same length as `x`.")
t = dt
else:
t = np.arange(len(x))*dt

x_hat = x
for _ in range(num_iterations):
Expand Down
Loading