Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 32 additions & 7 deletions src/.vitepress/routes/sidebar/user.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,43 @@ export const userRoutes = [
collapsed: true,
items: [
{
text: 'Examples',
text: 'Analysis Coding',
collapsed: true,
items: [
{text: 'Analysis Coding', link: '/analysis-coding'},
{text: 'Aggregation with fedstats', link: '/survival-regression.md'},
{text: 'Federated GLM', link: '/federated-logistic-regression.md'},
{text: 'Introduction', link: '/analysis-coding'},
{
text: 'Examples',
collapsed: true,
items: [
{text: 'Aggregation with fedstats', link: '/coding_examples/survival-regression'},
{text: 'Basic VCF QC', link: '/coding_examples/vcf-qc'},
{text: 'CLI Tools FastQC', link: '/coding_examples/cli-fastqc'},
{text: 'Deep Learning image classification', link: '/coding_examples/deep-learning-image-classifier'},
{text: 'Differential Privacy', link: '/coding_examples/differential-privacy-mvp'},
{text: 'Fedstats GLM', link: '/coding_examples/fedstats-logistic-regression'},
{text: 'Federated Logistic Regression', link: '/coding_examples/federated-logistic-regression'},
{text: 'GeMTeX text scores', link: '/coding_examples/gemtex-text-score-example'},
{text: 'PPRL', link: '/coding_examples/record_linkage'},
]
},
]

},
{text: 'PPRL', link: '/record_linkage'},
{text: 'Basic VCF QC', link: '/vcf-qc'},
{text: 'CLI Tools FastQC', link: '/cli-fastqc'},
{
text: 'Local Testing',
collapsed: true,
items: [
{text: 'Introduction', link: '/local-testing'},
{
text: 'Examples',
collapsed: true,
items: [
{text: 'Logistic Regression', link: '/testing_examples/local-testing-logistic-regression-example'},
{text: 'Differential Privacy', link: '/testing_examples/local-testing-dp-example'},
]
},
]
},
{text: 'FHIR Queries', link: '/fhir-query'},
// {text: 'Homomorphic Encryption', link: '/homomorphic-encryption'},
]
Expand Down
25 changes: 22 additions & 3 deletions src/guide/user/analysis-coding.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,12 @@ class MyAggregator(StarAggregator):
total_patient_count = sum(analysis_results)
return total_patient_count

def has_converged(self, result, last_result, num_iterations):
def has_converged(self, result, last_result):
"""
Determines if the aggregation process has converged.

:param result: The current aggregated result.
:param last_result: The aggregated result from the previous iteration.
:param num_iterations: The number of iterations completed so far.
:return: True if the aggregation has converged; False to continue iterations.
"""
# TODO (optional): if the parameter 'simple_analysis' in 'StarModel' is set to False,
Expand Down Expand Up @@ -132,7 +131,27 @@ if __name__ == "__main__":
- Input-Parameters given by ``StarModel``:
- ``result``: Output of the current iteration's ``aggregation_method()``.
- ``last_result``: Output of the previous iteration's ``aggregation_method()``.
- ``num_iterations``: Number of iterations executed. This number is incremented **after** executing the ``has_converged()``-check, i.e. equates to 1 in the second iteration of the analysis.
- ``main()``-function: Instantiates the ``StarModel`` class automatically executing the analysis on the node (either as an aggregator or analyzer node).

This script serves as a basic "Hello World" example for performing federated analysis using FHIR data.

### Utilizing Local Differential Privacy in ``StarModel``
::: warning Info
In its current state, Local Differential Privacy is only supported for analyzes that return results with a single numeric value.
:::
There currently exists an alternate version of ``StarModel`` implementing a simplified local differential privacy (LocalDP) to enhance privacy during analysis: ``StarLocalDPModel``.
In order to utilize said version, simply replace the ``StarModel`` import and instantiation in the above example with ``StarLocalDPModel``.
During instantiation, one has to specify the parameters ``sensitivity`` and ``epsilon``, in addition to ``StarModel``'s normal parameters.
```python
from flame.star import StarLocalDPModel

StarLocalDPModel(
...
epsilon=1.0, # Privacy budget for differential privacy
sensitivity=1.0, # Sensitivity parameter for differential privacy
...
)
```
Executing an analysis with ``StarLocalDPModel`` will add Laplace noise to the final results sent by the aggregator node to the Hub.
For this the given ``sensitivity`` is divided by ``epsilon`` to calculate the scale of the Laplace noise distribution.
For more information [see 'opendp' docs](https://docs.opendp.org/en/stable/api/python/opendp.measurements.html#opendp.measurements.make_laplace)).
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Using CLI Tools for Federated FASTQ QC

::: warning Assumed Knowledge
This guide assumes you're already familiar with the concepts shown in the **VCF QC** tutorial (federated execution model, analyzer vs. aggregator roles, project / datastore setup, approvals). If not, read that first: see [VCF QC Guide](/guide/user/vcf-qc) plus the background docs on [Coding an Analysis](/guide/user/analysis-coding) and the [Core SDK](/guide/user/sdk-core-doc).
This guide assumes you're already familiar with the concepts shown in the **VCF QC** tutorial (federated execution model, analyzer vs. aggregator roles, project / datastore setup, approvals). If not, read that first: see [VCF QC Guide](/guide/user/coding_examples/vcf-qc) plus the background docs on [Coding an Analysis](/guide/user/analysis-coding) and the [Core SDK](/guide/user/sdk-core-doc).
:::

::: info Summary
Expand Down Expand Up @@ -115,7 +115,7 @@ Example real output:
| Node fails with no files | Check extensions & datastore mapping; maybe restrict keys incorrectly. |

## See Also
* [VCF QC Guide](/guide/user/vcf-qc)
* [VCF QC Guide](/guide/user/coding_examples/vcf-qc)
* [Core SDK Reference](/guide/user/sdk-core-doc)
* [Coding an Analysis](/guide/user/analysis-coding)
* [Admin: Analysis Execution](/guide/admin/analysis-execution)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Applying platform SDK and CLI to run a Deep Learning application


The more detailed guide to the deep learning showcase can be read [here](./Guide-showcase-deep-learning-image-classifier.pdf)
The more detailed guide to the deep learning showcase can be read [here](../Guide-showcase-deep-learning-image-classifier.pdf)

::: warning Assumed Knowledge
This guide assumes you're already familiar with the basic concepts of federated learning. If not, read the background docs on [Coding an Analysis](/guide/user/analysis-coding) and the [Core SDK](/guide/user/sdk-core-doc).
Expand All @@ -25,9 +25,9 @@ By the end of this tutorial you will learn how to use Star patterns, and how to


::: tip The reason why we use Python as the language of choice is that there is no better alternative for this kind of application due to its suitable ecosystem
:::


## What does the analysis code?
## What does the analysis code do?
Brief overview:
* Analyzer runs network training for few specified number of epochs, then returns a dictionary with updated weights, loss value
* The aggregator subclass computes federated average of the returned model weights, loss and its metrics received from analyzer node, and checks convergence criterion each round
Expand Down
108 changes: 108 additions & 0 deletions src/guide/user/coding_examples/differential-privacy-mvp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Analysis Coding with Local Differential Privacy
::: warning Info
This section demonstrates the use of Local Differential Privacy in distributed analysis. The example is designed to show how to enhance privacy protection while performing federated analysis across multiple nodes.
:::

### Example Analysis using `StarLocalDPModel`: Counting Patients with Differential Privacy
This analysis example demonstrates how to count the total number of patients across multiple nodes with FHIR data, with differential privacy protections applied to the aggregated results. The patient counts from each node are summed and then noise is added to preserve privacy.

```python
from flame.star import StarLocalDPModel, StarAnalyzer, StarAggregator

# MyAnalyzer and MyAggregator classes remain unchanged from the introduction example

def main():
"""
Sets up and initiates the distributed analysis using the FLAME components.

- Defines the custom analyzer and aggregator classes.
- Specifies the type of data and queries to execute.
- Configures analysis parameters like iteration behavior and output format.
- Applies differential privacy to protect the aggregated results.
"""
StarLocalDPModel(
analyzer=MyAnalyzer, # Custom analyzer class (must inherit from StarAnalyzer)
aggregator=MyAggregator, # Custom aggregator class (must inherit from StarAggregator)
data_type='fhir', # Type of data source ('fhir' or 's3')
query='Patient?_summary=count', # Query or list of queries to retrieve data
simple_analysis=True, # True for single-iteration; False for multi-iterative analysis
output_type='str', # Output format for the final result ('str', 'bytes', or 'pickle')
epsilon=1.0, # Privacy budget for differential privacy
sensitivity=1.0, # Sensitivity parameter for differential privacy
analyzer_kwargs=None, # Additional keyword arguments for the custom analyzer constructor (i.e. MyAnalyzer)
aggregator_kwargs=None # Additional keyword arguments for the custom aggregator constructor (i.e. MyAggregator)
)


if __name__ == "__main__":
main()

```

### Explanation
- **`main()`-function**: Instantiates the `StarLocalDPModel` class automatically executing the analysis on the node (either as an aggregator or analyzer node).
StarLocalDPModel extends the standard StarModel by incorporating Local Differential Privacy mechanisms to enhance privacy during federated analysis.

This script serves as an example for performing privacy-preserving federated analysis using FHIR data with Local Differential Privacy.

### Understanding Local Differential Privacy in `StarLocalDPModel`
::: warning Info
In its current state, Local Differential Privacy is only supported for analyzes that return results with a single numeric value.
:::
`StarLocalDPModel` is an enhanced version of `StarModel` that implements Local Differential Privacy (LocalDP) to strengthen privacy guarantees during distributed analysis. The key difference is the addition of calibrated noise to the final aggregated results before they are sent to the Hub.

#### Key Parameters for Differential Privacy
When using `StarLocalDPModel`, two additional parameters must be specified during instantiation:
```python
StarLocalDPModel(
analyzer=MyAnalyzer,
aggregator=MyAggregator,
data_type='fhir',
query='Patient?_summary=count',
simple_analysis=True,
output_type='str',
epsilon=1.0, # Privacy budget for differential privacy
sensitivity=1.0, # Sensitivity parameter for differential privacy
analyzer_kwargs=None,
aggregator_kwargs=None
)
```

#### Privacy Parameters Explained
- **`epsilon`** (Privacy Budget): Controls the privacy-utility tradeoff. Lower values provide stronger privacy protection but add more noise to the results. Higher values provide more accurate results but weaker privacy guarantees.
- Typical values range from 0.1 (strong privacy) to 10.0 (weak privacy)
- In this example: `epsilon=1.0` provides a moderate level of privacy
- **`sensitivity`**: Represents the maximum amount that any single individual's data can change the analysis result. This is problem-specific and should be determined based on your analysis.
- For counting queries, sensitivity is typically 1.0 (one person can change the count by at most 1)
- In this example: `sensitivity=1.0` is appropriate for patient counting

#### Output with Differential Privacy vs Without
- **Without Differential Privacy**: The final aggregated result is the exact sum of patient counts from all nodes.
- Example Output: `Total Patient Count: 118`
- **With Differential Privacy**: The final aggregated result includes added noise, making it an approximate count that protects individual privacy.`
- Example Output: `Total Patient Count (with DP): 119.1`

#### How Noise is Applied

Executing an analysis with `StarLocalDPModel` will add Laplace noise to the final results sent by the aggregator node to the Hub. The scale of the noise is calculated as:

```
noise_scale = sensitivity / epsilon
```

The Laplace distribution is then used to sample noise that is added to the aggregated result, ensuring differential privacy while maintaining statistical utility.

For more information, see the [OpenDP documentation on Laplace mechanism](https://docs.opendp.org/en/stable/api/python/opendp.measurements.html#opendp.measurements.make_laplace).

#### Benefits of Local Differential Privacy
- **Privacy Protection**: Even if an adversary has access to the final aggregated results, they cannot determine whether any specific individual's data was included in the analysis.
- **Quantifiable Privacy**: The epsilon parameter provides a mathematically rigorous measure of privacy loss.
- **Regulatory Compliance**: Helps meet privacy requirements in healthcare and other sensitive domains.
- **Trust**: Participants can be assured that their individual data cannot be reverse-engineered from the published results.

#### Considerations When Using Differential Privacy
- **Accuracy vs. Privacy Tradeoff**: Lower epsilon values provide stronger privacy but reduce result accuracy.
- **Result Interpretation**: The added noise means results are approximate. Consider running sensitivity analyses with different epsilon values.
- **Single Numeric Results**: Currently, the implementation only supports single numeric outputs. Complex multi-dimensional results are not yet supported.
- **Sensitivity Calculation**: Properly calculating sensitivity is crucial for meaningful privacy guarantees. Underestimating sensitivity can compromise privacy; overestimating it adds unnecessary noise.

Loading