Add spectrum codec by LTMeyer · Pull Request #10 · PolymathicAI/AION

LTMeyer · 2025-04-10T11:25:55Z

This PR implements the codec for desi spectrum (stored at /mnt/ceph/users/polymathic/MMOMA/outputs/mmoma_codec_sdss+desi/6kzi0iz9/checkpoints/last.pt).

I checked I could reproduce the same encoded data as the original codec from the same random input.

Reflecting on this PR related to the other, we may want to reorganize a bit the codec. For instance, removing the pytorch-lightning dependencies make the codecs standard classes, whereas we would like them to be torch.nn.Module and ultimately offer HF support.

Ease model weight loading.

Add tests for SpectrumCodec

… into add-spectrum-tokenizer

Copilot

Pull Request Overview

This PR implements a new autoencoder‐based codec for processing desi spectra in the Aion framework. Key changes include introducing a Spectrum modality with dedicated fields, implementing the SpectrumCodec along with its encoding/decoding logic using ConvNeXt-based modules and quantizers, and adding supporting test data and dependency updates.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/tokenizers/test_spectrum_tokenizer.py	Added tests for the new spectrum codec using a Hugging Face–pretrained model.
tests/test_data/SPECTRUM_input_batch.pt	Added sample input data for spectrum modality.
tests/test_data/SPECTRUM_encoded_batch.pt	Added sample encoded output data for spectrum codec verification.
tests/test_data/SPECTRUM_decoded_batch.pt	Added sample decoded output data for spectrum codec verification.
pyproject.toml	Updated dependencies to include vector_quantize_pytorch.
aion/modalities.py	Introduced the Spectrum modality with fields for flux, ivar, mask, and wavelength.
aion/codecs/tokenizers/spectrum.py	Implemented a Spectrum codec class with autoencoder logic and quantization integration.
aion/codecs/quantizers/init.py	Added new LFQ and scalar quantizers for handling the latent space in the codec.
aion/codecs/modules/utils.py	Added custom LayerNorm and GRN utility modules.
aion/codecs/modules/spectrum.py	Provided interpolation functions and a latent spectral grid for converting between grids.
aion/codecs/modules/convnext.py	Added 1D ConvNeXt-based encoder and decoder modules for processing spectral data.

aion/codecs/tokenizers/spectrum.py

aion/codecs/modules/spectrum.py

Add spectrum tokenizer

2ce99b6

LTMeyer changed the base branch from main to add_tokenizers April 10, 2025 11:26

LTMeyer added 6 commits May 12, 2025 14:10

Rename FiniteScaleQuantizer->FiniteScalarQuantizer

9cd08ba

Add channel mask as input

d37fb30

Add test to ensure previous results consistency

1cf702b

Make the tokenizer a pytorch module

8c37cc4

Ease model weight loading.

Update test to load only one model checkpoint

aa71c1e

Merge branch 'add_tokenizers' into add-spectrum-tokenizer

2034934

Base automatically changed from add_tokenizers to main May 22, 2025 13:44

LTMeyer and others added 9 commits May 23, 2025 11:36

Merge branch 'main' into add-spectrum-tokenizer

254b871

Update SpectrumCodec

a6c1574

Add tests for SpectrumCodec

Add vector_quantize_pytorch to dependencies

b8549c1

Specify version of vector_quantize_pytorch

213fad9

Merge remote-tracking branch 'origin/main' into add-spectrum-tokenizer

0ebf3e1

Merge branch 'add-spectrum-tokenizer' of github.com:PolymathicAI/AION…

84250ef

… into add-spectrum-tokenizer

update spectrum

5bba1fb

fixed issues

740b3fc

fixing formatting

1730a01

EiffL requested review from EiffL and Copilot May 23, 2025 18:58

Copilot AI reviewed May 23, 2025

View reviewed changes

aion/codecs/tokenizers/spectrum.py Show resolved Hide resolved

aion/codecs/modules/spectrum.py Show resolved Hide resolved

EiffL merged commit c2626a4 into main May 23, 2025
2 checks passed

EiffL deleted the add-spectrum-tokenizer branch May 23, 2025 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spectrum codec#10

Add spectrum codec#10
EiffL merged 16 commits intomainfrom
add-spectrum-tokenizer

LTMeyer commented Apr 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LTMeyer commented Apr 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants