Add diffusion based noise model by tHarvey303 · Pull Request #35 · synthesizer-project/synference

tHarvey303 · 2026-05-06T14:48:06Z

This PR introduces a new generative machine learning model (ScoreBasedUncertaintyModel) to realistically simulate photometric measurement errors.

When generating mock galaxy catalogs, applying simple, uncorrelated Gaussian noise to true fluxes fails to capture the complex, real-world noise properties of surveys like COSMOS2020. Measurement errors are highly correlated across filters (e.g., due to blending or extraction methods) and exhibit heavy-tailed distributions.

This model uses a continuous-time diffusion framework (Variance Preserving SDE) to learn the full joint probability distribution of flux uncertainties across all bands in a survey, conditioned on the source's true magnitudes. This allows us to sample highly realistic, correlated noise for synthetic data.

Core Implementation

VP-SDE Diffusion Framework: Implements the continuous-time diffusion equations from Song et al. (2021). The model learns to reverse a noise-injection process to generate samples from the true uncertainty distribution.
Score-Matching Objective: Trains a neural network to estimate the score of the data distribution ($\nabla_x \log p_t(x)$) using denoising score matching.
Probability Flow ODE Sampler: Includes a fast, deterministic ODE solver for inference. Instead of requiring 500+ stochastic steps, this solver generates highly accurate uncertainty samples in ~50 steps while ensuring reproducibility.

Neural Network Architecture

The underlying score estimator (_RobustScoreNetwork) is designed specifically for stability and capturing high-frequency schedule details:

Residual Connections: Prevents signal degradation across deep layers.
Gaussian Fourier Projections: Maps the scalar diffusion time $t$ into a higher-dimensional periodic space, massively improving the network's ability to condition on time.
SiLU Activations: Used throughout to preserve gradients and prevent dead neurons during the complex score-matching task.
EMA Weight Tracking: Uses an Exponential Moving Average (AveragedModel) of the network weights during training to ensure smooth, artifact-free sampling at inference time.

Here's an example showing the true log-flux error vs magnitude density for the COSMOS2020 catalog (dashed are 68% of points, solid are 95% of points) in blue, and the recovered distribution from the trained model is shown in red. The conditional relationships between flux uncertainty in different filters is learned by the network, even in the complex distribution case of the HSC filters due to the striping of the survey depths.

To Do

Add tests
Add example to docs
Fully integrate this noise model into the existing training (single band noise model vs one per band).

…smos approach.

coderabbitai · 2026-05-06T14:48:16Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 727eefb4-970c-4945-be64-fe02354eee92

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch diffusion_noise_model

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tHarvey303 · 2026-05-06T15:37:19Z

@copilot investigate why the tests are failing

Agent-Logs-Url: https://github.com/synthesizer-project/synference/sessions/7874d3da-9020-4367-8fb5-9f51fe2180df Co-authored-by: tHarvey303 <8613717+tHarvey303@users.noreply.github.com>

Copilot · 2026-05-06T15:45:25Z

@copilot investigate why the tests are failing

The failures were caused by a ValueError: setting an array element with a sequence in library.py's _scatter method.

When noise models are used, apply_noise is called with np.atleast_1d(flux) (a 1-element array), so it returns 1-element arrays. These were then being assigned to scalar positions in scattered_fluxes[i] and errors[i], which raised the error.

Fixed in f6e0bf6 by extracting the scalar from the returned arrays with [0] indexing:

scattered_fluxes[i] = scattered_flux[0]
errors[i] = sigma[0]

tHarvey303 · 2026-05-06T17:06:51Z

@copilot There is still an issue relating to the coverage testings in the ltu-ili package. What is causing this?

…r-project/synference into diffusion_noise_model

…inh magnitudes is probably a better strategy.)

tHarvey303 and others added 6 commits May 1, 2026 18:56

Initial score-based diffusion uncertainity model, based on the pop-co…

a7a55e1

…smos approach.

Improved noise model architecture.

9f38413

balance validation loss score

0c466ff

Improving noise models

fc63a26

Performance improvements to diffusion noise model

d7a1e43

More reliable training behaviour.

7cdc9de

tHarvey303 marked this pull request as draft May 6, 2026 14:48

tHarvey303 requested a review from Copilot May 6, 2026 14:48

Copilot started reviewing on behalf of tHarvey303 May 6, 2026 14:49 View session

Copilot started work on behalf of tHarvey303 May 6, 2026 15:37 View session

Copilot AI and others added 2 commits May 6, 2026 15:43

Fix ValueError in _scatter when using noise models

aac1aae

Agent-Logs-Url: https://github.com/synthesizer-project/synference/sessions/7874d3da-9020-4367-8fb5-9f51fe2180df Co-authored-by: tHarvey303 <8613717+tHarvey303@users.noreply.github.com>

Use direct [0] indexing for scalar extraction from apply_noise result

f6e0bf6

Agent-Logs-Url: https://github.com/synthesizer-project/synference/sessions/7874d3da-9020-4367-8fb5-9f51fe2180df Co-authored-by: tHarvey303 <8613717+tHarvey303@users.noreply.github.com>

Copilot finished work on behalf of tHarvey303 May 6, 2026 15:46

Copilot started work on behalf of tHarvey303 May 6, 2026 17:07 View session

tHarvey303 and others added 8 commits May 7, 2026 00:42

Wire new noise model into feature array creation, add tests.

45702a9

Fix more unit attributes

6fab0d9

Fix tests

3316f1b

Merge branch 'diffusion_noise_model' of https://github.com/synthesize…

365c5dd

…r-project/synference into diffusion_noise_model

Make mock noise model for testing more stable

4b61542

Inital docs (not finished) for diffusion noise model.

05c0bb2

Add min/max magnitude clipping to diffusion noise model (in future as…

56bd6ea

…inh magnitudes is probably a better strategy.)

Added support for asinh magnitude scaling in the diffusion noise model

d189497

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add diffusion based noise model#35

Add diffusion based noise model#35
tHarvey303 wants to merge 16 commits into
mainfrom
diffusion_noise_model

tHarvey303 commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Review skipped

Uh oh!

tHarvey303 commented May 6, 2026

Uh oh!

Copilot AI commented May 6, 2026

Uh oh!

tHarvey303 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tHarvey303 commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

tHarvey303 commented May 6, 2026

Uh oh!

Copilot AI commented May 6, 2026

Uh oh!

tHarvey303 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 6, 2026 •

edited

Loading