Skip to content

Vqgan#49

Merged
arthurdrake1 merged 23 commits intomainfrom
vqgan
Oct 27, 2025
Merged

Vqgan#49
arthurdrake1 merged 23 commits intomainfrom
vqgan

Conversation

@arthurdrake1
Copy link
Contributor

Adds the VQGAN training and evaluation with some tuned hyperparameters and using an Online Clustered Codebook to promote 100% codebook utilization.

Also includes three new functions in transforms.py that could be useful for most/all EngiOpt models:

  • resize_to provides a standardized way of resizing tensors, i.e., to 128x128 prior to model training and vice versa
  • normalize to normalize the conditions for zero mean and unit std
  • drop_constant to remove any condition columns that remain constant throughout the data (i.e., overhang_constrant = 0 for all of beams2d)

The latter two are optional and can be disabled in the args for VQGAN.

Copy link
Collaborator

@ffelten ffelten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @arthurdrake1, looks very solid, I am honestly impressed by the complexity of the model :).

My comments are essentially some small improvements. I however have a more "meta comment" to make:
I believe the single file implementation is not going to do well in this case. The file iscurrently too big to navigate. I have two suggestions to make the code easier to understand:

  • put all the utils, network components, gpt (things you use in the algorithm but are not the core logic of the algorithm itself) in their own file. I'd keep the networks definitions (Encoder, Decoder, etc.) in the training script, though.
  • have the different stages in different files / scripts. Currently, the script is doing a lot of stuff and instantiating a lot of pieces. I believe it might be easier to get if we have different with well defined input and output.

What do you think? :)


# Restores the pytorch model from wandb
if args.wandb_entity is not None:
artifact_path_0 = f"{args.wandb_entity}/{args.wandb_project}/{args.problem_id}_vqgan_cvqgan:seed_{seed}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe having more explicit names than 0, 1, 2, e.g., artifact_transformer might help keeping track of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I just used 0, 1, 2 for an easier shorthand while coding, but agree spelling them out will help understanding

)

# Clean up conditions based on model training settings and convert back to tensor
sampled_conditions_new = sampled_conditions.select(range(len(sampled_conditions)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what this does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copies over the original sampled_conditions into a new dataset that can then be normalized/cleaned prior to feeding into the model. The reason we need this copy is (to the best of my knowledge) the metrics.metrics() call later requires the original conditions for its calculations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair! The name is a bit cryptic then, though.

entry: pyright
language: node
pass_filenames: false
pass_filenames: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for a debug earlier, I will change it back

pyproject.toml Outdated
pythonVersion = "3.9"
pythonPlatform = "All"
typeshedPath = "typeshed"
# typeshedPath = "typeshed" -> commented out may lead to precommit out of memory error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for pre-commit debug, I will undo this


def __init__(self):
super().__init__()
self.register_buffer("shift", th.tensor([-0.030, -0.088, -0.188])[None, :, None, None])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants are coming from LPIPS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are just constants to normalize the input for the pretrained VGG model



###########################################
########## GPT-2 BASE CODE BELOW ##########
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this into its own file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

th.as_tensor(training_ds["optimal_upsampled"][:]).to(device),
*[th.as_tensor(training_ds[key][:]).to(device) for key in conditions],
)
dataloader_0 = th.utils.data.DataLoader(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3 times the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different model stages could have different batch sizes

name=run_name,
dir="./logs/wandb",
)
wandb.define_metric("0_step", summary="max")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in this special multi-stage training case, to have separate training step counts for each one.

@arthurdrake1
Copy link
Contributor Author

@ffelten Going to make the minor edits now. I will also move all the "helper modules" like nanogpt into their own file. As for separating out the different training phases, this would be more complicated as the current logic is built on the single script implementation, i.e., easily reusing the trained vqgan and cvqgan to then train the transformer.

Copy link
Collaborator

@ffelten ffelten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor stuff. I did not read all the details but it looks clean enough to me!

cond_lr: float = 2e-4 # Default: 2e-4
"""learning rate for CVQGAN"""
latent_size: int = 16
"""size of the latent feature map (automatically determined later)"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this determined later? Also, is this "automated determination" reflected in the wandb config (hyperparams) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latent feature map dimension is dictated by the number of encoder and decoder layers, so in practice the user has to increase or decrease those to change latent_size. Number of image channels is more straightforward, we just update that based on the number of channels in the data (always 1 in our case, for now).

I checked the wandb config for a test run I did just now with bogus values (--latent_size 123123 and --image_channels 123123). They remained at 16 and 1 respectively in the hyperparams list on wandb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffelten Upon further review of the code, I don't think these args are needed. They're just calculated as normal variables and override any actual user inputs as I determined above. Should be good to go now.

"""number of epochs of training"""
batch_size_vqgan: int = 16
"""size of the batches for Stage 1"""
lr_vqgan: float = 5e-5 # Default: 2e-4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you want to keep the default comment? There are some others for different hyperparams, too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks, this was for debugging so I removed it now

@arthurdrake1 arthurdrake1 merged commit fe036d1 into main Oct 27, 2025
3 checks passed
@arthurdrake1 arthurdrake1 deleted the vqgan branch October 27, 2025 14:59
@markfuge markfuge mentioned this pull request Jan 7, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants