Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #70 +/- ##
==========================================
- Coverage 88.36% 87.08% -1.28%
==========================================
Files 24 24
Lines 3583 3980 +397
==========================================
+ Hits 3166 3466 +300
- Misses 417 514 +97
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e hi_res_feature parameter handling
grantbuster
left a comment
There was a problem hiding this comment.
Generally i think it looks good but i kind of want to hear your plan for training positions vs. inference global/chunk positions. This is also pretty experimental and i just want to see how it goes. I've been having a tough time getting this to work haha.
| else tf.keras.layers.AveragePooling2D(**kwargs) | ||
| ) | ||
|
|
||
| pos = tf.range(x.shape[dim_index]) / x.shape[dim_index] |
There was a problem hiding this comment.
i'd prefer linspace for closed boundaries but this is fine too.
so we're going with relative positional encodings right? Have you thought about position 0-1 in each small training example vs. 0-1 in the inference space vs. 0-1 in each fwp chunk? Even if you have a plan we'll have to verify this works. I think there could be some issues around your encoding frequencies working well with the training sample size in 0-1 but poorly when you have a much larger inference size also in 0-1 because the spatial resolution that the transformer is seeing effectively changes. Not confident about this but i think it's something we might have to experiment with.
There was a problem hiding this comment.
Agreed, linspace is better.
Good points here. I hadn't thought about this potential issue on effective resolution. I'd expected that as long as fwp chunk shapes are similar size to batch shapes this will be fine, but maybe lat / lon / time features should actually be required inputs for this.
| Dropout rate for attention weights. Default is 0 (no dropout). | ||
| patch_size : int | ||
| Height, width, and depth of patches. Default is 1 for pixel-wise | ||
| tokenization. |
There was a problem hiding this comment.
Patch size of Sup3r hypercube right? not of key/value obs?
There was a problem hiding this comment.
If the key data wasn't sparse the patch size would be used for key data also. When it is sparse data the patch size is only used for query data and patch_size = 1 for key data.
| ) | ||
| q_enc += self._pos_encoding( | ||
| x, patch_size=self.patch_size, dim_index=2, embed_dim=embed_dim | ||
| ) |
There was a problem hiding this comment.
I think traditionally that the row/column positional encodings are done each for embed_dim//2 and then concatenated then added to the tensor x. Yes just confirmed this is at least how SatMAE does it.
There was a problem hiding this comment.
Hmm, then how would that change with additional depth/time encoding? Seems simplest to use embed_dim channels for each encoding.
| self.key_dim = key_dim | ||
| self.patch_size = patch_size | ||
| self.embed_dim = embed_dim | ||
| self.dropout = tf.keras.layers.Dropout(dropout) |
There was a problem hiding this comment.
I think that standard attention mechanisms including tf.keras.layers.MultiHeadAttention might include the "output" projections and dropout as part of the MHA class. You can confirm this by checking the MHA attribute "output_dense"
There was a problem hiding this comment.
Good call on dropout. I included the extra projection since you can change the attention shape with the output_shape arg but maybe that flexibility isn't worth it.
Yeah I thought about the inference issue a bit before opting for the relative encoding. I decided adding obs doesn't change the currently operative assumption that a fwp chunk includes everything that significantly influences the output (if not then we should increase padding). |
…size parameters and refactor encoding methods
There was a problem hiding this comment.
Pull request overview
This PR introduces a new cross-attention layer intended for observation assimilation, adds unit tests around tokenization/positional encoding behavior, and modernizes linting by switching from Flake8/super-linter to Ruff.
Changes:
- Added
TokenizeEncodeBase+Sup3rCrossAttentionlayers for tokenization + positional encoding + cross-attention. - Expanded tests to cover cross-attention (2D/3D), patch sizes > 1, and positional encoding pooling correctness.
- Updated dev tooling/CI linting to Ruff and adjusted Pixi workspace config.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
phygnn/layers/custom_layers.py |
Adds tokenization/positional encoding base + new cross-attention layer; removes older attention layers. |
tests/test_layers.py |
Adds tests for cross-attention behavior and positional encoding pooling with patch sizes > 1. |
.github/workflows/linter.yml |
Switches linting to Ruff action and runs on PRs. |
pyproject.toml |
Replaces Flake8 with Ruff in dev deps; updates Pixi config section name. |
pixi.lock |
Updates locked deps to reflect tooling changes. |
phygnn/model_interfaces/base_model.py |
Minor docstring capitalization + small style tweak. |
phygnn/base.py |
Formatting-only changes. |
.github/linters/.python-lint / .github/linters/.flake8 |
Removes legacy linter config files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| i = tf.range(d // 2, dtype=tf.float32) | ||
| theta = k / tf.pow(omega, (2 * i / d)) | ||
| enc[..., ::2] = tf.math.sin(theta) | ||
| enc[..., 1::2] = tf.math.cos(theta) |
| slices = [] | ||
| for pad in pads: | ||
| slices.append(slice(pad[0], -pad[1] if pad[1] > 0 else None)) | ||
| return out[tuple(slices)] |
| x_enc += self._pos_encode(x_pad, dim_index=2) | ||
| if self.rank == 5: | ||
| x_enc += self._pos_encode(x_pad, dim_index=3) | ||
| return x_tok, x_enc, x_pad.shape |
…ic padding; enhance positional encoding methods
There was a problem hiding this comment.
Pull request overview
This PR introduces a cross-attention block intended for observation (sparse NaN) assimilation by adding tokenization + positional encoding utilities, and updates the repo’s linting/tooling to use Ruff.
Changes:
- Added
TokenizeEncodeBaseandSup3rCrossAttentionlayers (including patch-based tokenization and positional encoding) and removed older attention layer implementations. - Expanded layer tests to cover cross-attention behavior (2D/3D, patch sizes, and positional encoding pooling).
- Migrated dev lint tooling from Flake8/super-linter to Ruff (pyproject + GitHub Actions), and adjusted Pixi config section naming.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
phygnn/layers/custom_layers.py |
Adds tokenization/positional-encoding base and new cross-attention layer; removes prior attention classes. |
tests/test_layers.py |
Adds cross-attention and positional-encoding tests. |
.github/workflows/linter.yml |
Switches CI linting to Ruff GitHub Action and adds PR trigger. |
pyproject.toml |
Replaces Flake8 with Ruff in dev deps; updates Pixi section header. |
pixi.lock |
Lockfile updates reflecting dependency/tooling changes. |
phygnn/model_interfaces/base_model.py |
Minor docstring and small style change (in {1, 2}). |
phygnn/base.py |
Formatting-only changes. |
.github/linters/.python-lint / .github/linters/.flake8 |
Removes legacy linter configuration files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… build method to accept separate input shapes
- Introduced MultiHeadAttention class for custom multi-head attention with bias input. - Added Sup3rCrossAlibi for cross attention with distance bias instead of absolute position encoding..
…or query, key, and value inputs
… parameter; update tests for PositionEncoder and Tokenizer integration
…rCrossAttention attention call
…erence Shaw et al. (2018)
…per position of attention inputs and pos encoding. Shaw and Alibi layers use alternative approaches.
…nd add utility to compute day of year and second of year from timestamps
No description provided.