Trouble about patch_size?

Dear,
According to your diagram, your model should be migrated from vision transformer, but I can't find the part of how to do patchify, down_sample_input? In addition, the initialization of many parameters related to model dimensions does not seem to comply with prefetching. Can you explain why?