-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Question
Hello author, I currently have two questions I'd like to inquire about:
1.Is there a separate training script file or program file for Nextdit, and is Nextdit's training dataset the same as Navdp?
2.Regarding this part of the code, according to the paper description (to follow with two sources of conditions: 1. Low frequency travel latency Z 'from System 2. 2. High frequency RGB inputs. Since the dual system inference is performed asynchronously (Slow System 2, Fast System 1), the late goal generated at time t remains fixed. At time t+k, System 1 must still interpret this expired late goal to update the travel accuracy, estimate the distance already traveled and adapt to dynamic changes. To achieve this, System 1 Encodes both the RGB features corresponding to the last frame from System 2 at time t and the current observation at time t+k). My current understanding is that cur_image scorresponds to the image generated by traj_ midden_states at that time, and pix_goal_images may be images that have been randomly taken back a few frames, i.e. constructed with It+k. Is this understanding correct?
cur_images = traj_images.flatten(0, 1)
pix_goal_images = traj_images[:, 0:1].repeat(1, traj_images.size(1), 1, 1, 1).flatten(0, 1)
bsz = cur_images.size(0)
images_dp = torch.stack([pix_goal_images, cur_images], dim=1).permute(0, 1, 4, 2, 3)