Merged
Conversation
Ease model weight loading.
Add tests for SpectrumCodec
… into add-spectrum-tokenizer
There was a problem hiding this comment.
Pull Request Overview
This PR implements a new autoencoder‐based codec for processing desi spectra in the Aion framework. Key changes include introducing a Spectrum modality with dedicated fields, implementing the SpectrumCodec along with its encoding/decoding logic using ConvNeXt-based modules and quantizers, and adding supporting test data and dependency updates.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tokenizers/test_spectrum_tokenizer.py | Added tests for the new spectrum codec using a Hugging Face–pretrained model. |
| tests/test_data/SPECTRUM_input_batch.pt | Added sample input data for spectrum modality. |
| tests/test_data/SPECTRUM_encoded_batch.pt | Added sample encoded output data for spectrum codec verification. |
| tests/test_data/SPECTRUM_decoded_batch.pt | Added sample decoded output data for spectrum codec verification. |
| pyproject.toml | Updated dependencies to include vector_quantize_pytorch. |
| aion/modalities.py | Introduced the Spectrum modality with fields for flux, ivar, mask, and wavelength. |
| aion/codecs/tokenizers/spectrum.py | Implemented a Spectrum codec class with autoencoder logic and quantization integration. |
| aion/codecs/quantizers/init.py | Added new LFQ and scalar quantizers for handling the latent space in the codec. |
| aion/codecs/modules/utils.py | Added custom LayerNorm and GRN utility modules. |
| aion/codecs/modules/spectrum.py | Provided interpolation functions and a latent spectral grid for converting between grids. |
| aion/codecs/modules/convnext.py | Added 1D ConvNeXt-based encoder and decoder modules for processing spectral data. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the codec for desi spectrum (stored at
/mnt/ceph/users/polymathic/MMOMA/outputs/mmoma_codec_sdss+desi/6kzi0iz9/checkpoints/last.pt).I checked I could reproduce the same encoded data as the original codec from the same random input.
Reflecting on this PR related to the other, we may want to reorganize a bit the codec. For instance, removing the pytorch-lightning dependencies make the codecs standard classes, whereas we would like them to be
torch.nn.Moduleand ultimately offer HF support.