Dear authors, first, thank you for your pioneering work in SNN transformers!
I have trouble reproducing the firing rate stats in Table 2, can you give some guidance on it? I utilized the inherent Monitor class in spikingjelly, which can directly get the multistepLIFNode output records, then I averaget the tensor to get the result. My experiment model is on 8-768 on imagenet on the first sample, and the calculated result is fairly high.
`Layer patch_embed.proj_lif firing rates: tensor([0.0805, 0.1413, 0.1159, 0.1617], device='cuda:0')
Layer patch_embed.proj_lif1 firing rates: tensor([0.0415, 0.0630, 0.0563, 0.0492], device='cuda:0')
Layer patch_embed.proj_lif2 firing rates: tensor([0.0320, 0.0430, 0.0350, 0.0347], device='cuda:0')
Layer patch_embed.proj_lif3 firing rates: tensor([0.0761, 0.1580, 0.1574, 0.1635], device='cuda:0')
Layer patch_embed.rpe_lif firing rates: tensor([0.2587, 0.3355, 0.3397, 0.3472], device='cuda:0')
Layer block.0.attn.q_lif firing rates: tensor([0.1975, 0.2342, 0.2243, 0.2264], device='cuda:0')
Layer block.0.attn.k_lif firing rates: tensor([0.0201, 0.0218, 0.0225, 0.0231], device='cuda:0')
Layer block.0.attn.v_lif firing rates: tensor([0.0233, 0.0264, 0.0284, 0.0292], device='cuda:0')
Layer block.0.attn.attn_lif firing rates: tensor([0.0709, 0.1024, 0.1151, 0.1132], device='cuda:0')
Layer block.0.attn.talking_heads_lif firing rates: tensor([0.3474, 0.4034, 0.4171, 0.4215], device='cuda:0')
Layer block.0.attn.shortcut_lif firing rates: tensor([0.0443, 0.0526, 0.0514, 0.0605], device='cuda:0')
Layer block.0.mlp.fc1_lif firing rates: tensor([0.3362, 0.3796, 0.4058, 0.4084], device='cuda:0')
Layer block.0.mlp.fc2_lif firing rates: tensor([0.2034, 0.2091, 0.1929, 0.1689], device='cuda:0')
Layer block.1.attn.q_lif firing rates: tensor([0.0170, 0.0207, 0.0232, 0.0292], device='cuda:0')
Layer block.1.attn.k_lif firing rates: tensor([0.0158, 0.0205, 0.0238, 0.0313], device='cuda:0')
Layer block.1.attn.v_lif firing rates: tensor([0.0981, 0.1177, 0.1285, 0.1045], device='cuda:0')
Layer block.1.attn.attn_lif firing rates: tensor([0.3290, 0.3672, 0.3942, 0.3970], device='cuda:0')
Layer block.1.attn.talking_heads_lif firing rates: tensor([0.0310, 0.0406, 0.0401, 0.0448], device='cuda:0')
Layer block.1.attn.shortcut_lif firing rates: tensor([0.3308, 0.3651, 0.3935, 0.3997], device='cuda:0')
Layer block.1.mlp.fc1_lif firing rates: tensor([0.2001, 0.2043, 0.1819, 0.1660], device='cuda:0')
Layer block.1.mlp.fc2_lif firing rates: tensor([0.0138, 0.0195, 0.0215, 0.0234], device='cuda:0')
Layer block.2.attn.q_lif firing rates: tensor([0.0123, 0.0190, 0.0198, 0.0248], device='cuda:0')
Layer block.2.attn.k_lif firing rates: tensor([0.1239, 0.1571, 0.1514, 0.1501], device='cuda:0')
Layer block.2.attn.v_lif firing rates: tensor([0.3254, 0.3546, 0.3812, 0.3901], device='cuda:0')
Layer block.2.attn.attn_lif firing rates: tensor([0.0263, 0.0333, 0.0337, 0.0345], device='cuda:0')
Layer block.2.attn.talking_heads_lif firing rates: tensor([0.3246, 0.3619, 0.3886, 0.3975], device='cuda:0')
Layer block.2.attn.shortcut_lif firing rates: tensor([0.2110, 0.2198, 0.1792, 0.1551], device='cuda:0')
Layer block.2.mlp.fc1_lif firing rates: tensor([0.0077, 0.0118, 0.0134, 0.0136], device='cuda:0')
Layer block.2.mlp.fc2_lif firing rates: tensor([0.0076, 0.0117, 0.0128, 0.0144], device='cuda:0')
Layer block.3.attn.q_lif firing rates: tensor([0.1154, 0.1594, 0.1543, 0.1468], device='cuda:0')
Layer block.3.attn.k_lif firing rates: tensor([0.3268, 0.3625, 0.3887, 0.4003], device='cuda:0')
Layer block.3.attn.v_lif firing rates: tensor([0.0235, 0.0278, 0.0275, 0.0266], device='cuda:0')
Layer block.3.attn.attn_lif firing rates: tensor([0.3246, 0.3604, 0.3885, 0.4033], device='cuda:0')
Layer block.3.attn.talking_heads_lif firing rates: tensor([0.1038, 0.1105, 0.1014, 0.1028], device='cuda:0')
Layer block.3.attn.shortcut_lif firing rates: tensor([0.0052, 0.0083, 0.0099, 0.0116], device='cuda:0')
Layer block.3.mlp.fc1_lif firing rates: tensor([0.0061, 0.0088, 0.0108, 0.0128], device='cuda:0')
Layer block.3.mlp.fc2_lif firing rates: tensor([0.0629, 0.0795, 0.0859, 0.0962], device='cuda:0')
Layer block.4.attn.q_lif firing rates: tensor([0.3294, 0.3663, 0.3919, 0.4042], device='cuda:0')
Layer block.4.attn.k_lif firing rates: tensor([0.0202, 0.0207, 0.0203, 0.0199], device='cuda:0')
Layer block.4.attn.v_lif firing rates: tensor([0.3377, 0.3679, 0.3895, 0.3973], device='cuda:0')
Layer block.4.attn.attn_lif firing rates: tensor([0.0532, 0.0562, 0.0511, 0.0576], device='cuda:0')
Layer block.4.attn.talking_heads_lif firing rates: tensor([0.0015, 0.0025, 0.0033, 0.0044], device='cuda:0')
Layer block.4.attn.shortcut_lif firing rates: tensor([0.0016, 0.0025, 0.0032, 0.0040], device='cuda:0')
Layer block.4.mlp.fc1_lif firing rates: tensor([0.0202, 0.0268, 0.0335, 0.0429], device='cuda:0')
Layer block.4.mlp.fc2_lif firing rates: tensor([0.3397, 0.3696, 0.3905, 0.3980], device='cuda:0')
Layer block.5.attn.q_lif firing rates: tensor([0.0208, 0.0187, 0.0176, 0.0179], device='cuda:0')
Layer block.5.attn.k_lif firing rates: tensor([0.3434, 0.3674, 0.3751, 0.3858], device='cuda:0')
Layer block.5.attn.v_lif firing rates: tensor([0.0322, 0.0275, 0.0267, 0.0337], device='cuda:0')
Layer block.5.attn.attn_lif firing rates: tensor([0.0004, 0.0005, 0.0007, 0.0010], device='cuda:0')
Layer block.5.attn.talking_heads_lif firing rates: tensor([0.0005, 0.0008, 0.0009, 0.0011], device='cuda:0')
Layer block.5.attn.shortcut_lif firing rates: tensor([0.0055, 0.0075, 0.0088, 0.0162], device='cuda:0')
Layer block.5.mlp.fc1_lif firing rates: tensor([0.3344, 0.3649, 0.3666, 0.3791], device='cuda:0')
Layer block.5.mlp.fc2_lif firing rates: tensor([0.0199, 0.0175, 0.0175, 0.0179], device='cuda:0')
Layer block.6.attn.q_lif firing rates: tensor([0.3073, 0.3444, 0.3511, 0.3632], device='cuda:0')
Layer block.6.attn.k_lif firing rates: tensor([0.0411, 0.0435, 0.0420, 0.0393], device='cuda:0')
Layer block.6.attn.v_lif firing rates: tensor([3.9444e-05, 6.3111e-05, 8.0550e-05, 1.0380e-04], device='cuda:0')
Layer block.6.attn.attn_lif firing rates: tensor([1.7023e-05, 4.4012e-05, 7.3491e-05, 9.3006e-05], device='cuda:0')
Layer block.6.attn.talking_heads_lif firing rates: tensor([0.0007, 0.0015, 0.0022, 0.0029], device='cuda:0')
Layer block.6.attn.shortcut_lif firing rates: tensor([0.2965, 0.3350, 0.3361, 0.3459], device='cuda:0')
Layer block.6.mlp.fc1_lif firing rates: tensor([0.0102, 0.0120, 0.0108, 0.0106], device='cuda:0')
Layer block.6.mlp.fc2_lif firing rates: tensor([1.6276e-04, 4.0975e-01, 3.5807e-01, 4.9528e-01], device='cuda:0')
Layer block.7.attn.q_lif firing rates: tensor([0.0805, 0.1413, 0.1159, 0.1617], device='cuda:0')
Layer block.7.attn.k_lif firing rates: tensor([0.0415, 0.0630, 0.0563, 0.0492], device='cuda:0')
Layer block.7.attn.v_lif firing rates: tensor([0.0320, 0.0430, 0.0350, 0.0347], device='cuda:0')
Layer block.7.attn.attn_lif firing rates: tensor([0.0761, 0.1580, 0.1574, 0.1635], device='cuda:0')
Layer block.7.attn.talking_heads_lif firing rates: tensor([0.2587, 0.3355, 0.3397, 0.3472], device='cuda:0')
Layer block.7.attn.shortcut_lif firing rates: tensor([0.1975, 0.2342, 0.2243, 0.2264], device='cuda:0')
Layer block.7.mlp.fc1_lif firing rates: tensor([0.0201, 0.0218, 0.0225, 0.0231], device='cuda:0')
Layer block.7.mlp.fc2_lif firing rates: tensor([0.0233, 0.0264, 0.0284, 0.0292], device='cuda:0')
Layer head_lif firing rates: tensor([0.0709, 0.1024, 0.1151, 0.1132], device='cuda:0')`
Therefore the average result at T=4 is:
q_lif: 0.1580
k_lif: 0.1303
v_lif: 0.1209
attn_lif: 0.1334
Now I am having trouble finding where the bug is, can you provided some guidance on computing and reproducing the results? Many thanks!
Dear authors, first, thank you for your pioneering work in SNN transformers!
I have trouble reproducing the firing rate stats in Table 2, can you give some guidance on it? I utilized the inherent Monitor class in spikingjelly, which can directly get the multistepLIFNode output records, then I averaget the tensor to get the result. My experiment model is on 8-768 on imagenet on the first sample, and the calculated result is fairly high.
`Layer patch_embed.proj_lif firing rates: tensor([0.0805, 0.1413, 0.1159, 0.1617], device='cuda:0')
Layer patch_embed.proj_lif1 firing rates: tensor([0.0415, 0.0630, 0.0563, 0.0492], device='cuda:0')
Layer patch_embed.proj_lif2 firing rates: tensor([0.0320, 0.0430, 0.0350, 0.0347], device='cuda:0')
Layer patch_embed.proj_lif3 firing rates: tensor([0.0761, 0.1580, 0.1574, 0.1635], device='cuda:0')
Layer patch_embed.rpe_lif firing rates: tensor([0.2587, 0.3355, 0.3397, 0.3472], device='cuda:0')
Layer block.0.attn.q_lif firing rates: tensor([0.1975, 0.2342, 0.2243, 0.2264], device='cuda:0')
Layer block.0.attn.k_lif firing rates: tensor([0.0201, 0.0218, 0.0225, 0.0231], device='cuda:0')
Layer block.0.attn.v_lif firing rates: tensor([0.0233, 0.0264, 0.0284, 0.0292], device='cuda:0')
Layer block.0.attn.attn_lif firing rates: tensor([0.0709, 0.1024, 0.1151, 0.1132], device='cuda:0')
Layer block.0.attn.talking_heads_lif firing rates: tensor([0.3474, 0.4034, 0.4171, 0.4215], device='cuda:0')
Layer block.0.attn.shortcut_lif firing rates: tensor([0.0443, 0.0526, 0.0514, 0.0605], device='cuda:0')
Layer block.0.mlp.fc1_lif firing rates: tensor([0.3362, 0.3796, 0.4058, 0.4084], device='cuda:0')
Layer block.0.mlp.fc2_lif firing rates: tensor([0.2034, 0.2091, 0.1929, 0.1689], device='cuda:0')
Layer block.1.attn.q_lif firing rates: tensor([0.0170, 0.0207, 0.0232, 0.0292], device='cuda:0')
Layer block.1.attn.k_lif firing rates: tensor([0.0158, 0.0205, 0.0238, 0.0313], device='cuda:0')
Layer block.1.attn.v_lif firing rates: tensor([0.0981, 0.1177, 0.1285, 0.1045], device='cuda:0')
Layer block.1.attn.attn_lif firing rates: tensor([0.3290, 0.3672, 0.3942, 0.3970], device='cuda:0')
Layer block.1.attn.talking_heads_lif firing rates: tensor([0.0310, 0.0406, 0.0401, 0.0448], device='cuda:0')
Layer block.1.attn.shortcut_lif firing rates: tensor([0.3308, 0.3651, 0.3935, 0.3997], device='cuda:0')
Layer block.1.mlp.fc1_lif firing rates: tensor([0.2001, 0.2043, 0.1819, 0.1660], device='cuda:0')
Layer block.1.mlp.fc2_lif firing rates: tensor([0.0138, 0.0195, 0.0215, 0.0234], device='cuda:0')
Layer block.2.attn.q_lif firing rates: tensor([0.0123, 0.0190, 0.0198, 0.0248], device='cuda:0')
Layer block.2.attn.k_lif firing rates: tensor([0.1239, 0.1571, 0.1514, 0.1501], device='cuda:0')
Layer block.2.attn.v_lif firing rates: tensor([0.3254, 0.3546, 0.3812, 0.3901], device='cuda:0')
Layer block.2.attn.attn_lif firing rates: tensor([0.0263, 0.0333, 0.0337, 0.0345], device='cuda:0')
Layer block.2.attn.talking_heads_lif firing rates: tensor([0.3246, 0.3619, 0.3886, 0.3975], device='cuda:0')
Layer block.2.attn.shortcut_lif firing rates: tensor([0.2110, 0.2198, 0.1792, 0.1551], device='cuda:0')
Layer block.2.mlp.fc1_lif firing rates: tensor([0.0077, 0.0118, 0.0134, 0.0136], device='cuda:0')
Layer block.2.mlp.fc2_lif firing rates: tensor([0.0076, 0.0117, 0.0128, 0.0144], device='cuda:0')
Layer block.3.attn.q_lif firing rates: tensor([0.1154, 0.1594, 0.1543, 0.1468], device='cuda:0')
Layer block.3.attn.k_lif firing rates: tensor([0.3268, 0.3625, 0.3887, 0.4003], device='cuda:0')
Layer block.3.attn.v_lif firing rates: tensor([0.0235, 0.0278, 0.0275, 0.0266], device='cuda:0')
Layer block.3.attn.attn_lif firing rates: tensor([0.3246, 0.3604, 0.3885, 0.4033], device='cuda:0')
Layer block.3.attn.talking_heads_lif firing rates: tensor([0.1038, 0.1105, 0.1014, 0.1028], device='cuda:0')
Layer block.3.attn.shortcut_lif firing rates: tensor([0.0052, 0.0083, 0.0099, 0.0116], device='cuda:0')
Layer block.3.mlp.fc1_lif firing rates: tensor([0.0061, 0.0088, 0.0108, 0.0128], device='cuda:0')
Layer block.3.mlp.fc2_lif firing rates: tensor([0.0629, 0.0795, 0.0859, 0.0962], device='cuda:0')
Layer block.4.attn.q_lif firing rates: tensor([0.3294, 0.3663, 0.3919, 0.4042], device='cuda:0')
Layer block.4.attn.k_lif firing rates: tensor([0.0202, 0.0207, 0.0203, 0.0199], device='cuda:0')
Layer block.4.attn.v_lif firing rates: tensor([0.3377, 0.3679, 0.3895, 0.3973], device='cuda:0')
Layer block.4.attn.attn_lif firing rates: tensor([0.0532, 0.0562, 0.0511, 0.0576], device='cuda:0')
Layer block.4.attn.talking_heads_lif firing rates: tensor([0.0015, 0.0025, 0.0033, 0.0044], device='cuda:0')
Layer block.4.attn.shortcut_lif firing rates: tensor([0.0016, 0.0025, 0.0032, 0.0040], device='cuda:0')
Layer block.4.mlp.fc1_lif firing rates: tensor([0.0202, 0.0268, 0.0335, 0.0429], device='cuda:0')
Layer block.4.mlp.fc2_lif firing rates: tensor([0.3397, 0.3696, 0.3905, 0.3980], device='cuda:0')
Layer block.5.attn.q_lif firing rates: tensor([0.0208, 0.0187, 0.0176, 0.0179], device='cuda:0')
Layer block.5.attn.k_lif firing rates: tensor([0.3434, 0.3674, 0.3751, 0.3858], device='cuda:0')
Layer block.5.attn.v_lif firing rates: tensor([0.0322, 0.0275, 0.0267, 0.0337], device='cuda:0')
Layer block.5.attn.attn_lif firing rates: tensor([0.0004, 0.0005, 0.0007, 0.0010], device='cuda:0')
Layer block.5.attn.talking_heads_lif firing rates: tensor([0.0005, 0.0008, 0.0009, 0.0011], device='cuda:0')
Layer block.5.attn.shortcut_lif firing rates: tensor([0.0055, 0.0075, 0.0088, 0.0162], device='cuda:0')
Layer block.5.mlp.fc1_lif firing rates: tensor([0.3344, 0.3649, 0.3666, 0.3791], device='cuda:0')
Layer block.5.mlp.fc2_lif firing rates: tensor([0.0199, 0.0175, 0.0175, 0.0179], device='cuda:0')
Layer block.6.attn.q_lif firing rates: tensor([0.3073, 0.3444, 0.3511, 0.3632], device='cuda:0')
Layer block.6.attn.k_lif firing rates: tensor([0.0411, 0.0435, 0.0420, 0.0393], device='cuda:0')
Layer block.6.attn.v_lif firing rates: tensor([3.9444e-05, 6.3111e-05, 8.0550e-05, 1.0380e-04], device='cuda:0')
Layer block.6.attn.attn_lif firing rates: tensor([1.7023e-05, 4.4012e-05, 7.3491e-05, 9.3006e-05], device='cuda:0')
Layer block.6.attn.talking_heads_lif firing rates: tensor([0.0007, 0.0015, 0.0022, 0.0029], device='cuda:0')
Layer block.6.attn.shortcut_lif firing rates: tensor([0.2965, 0.3350, 0.3361, 0.3459], device='cuda:0')
Layer block.6.mlp.fc1_lif firing rates: tensor([0.0102, 0.0120, 0.0108, 0.0106], device='cuda:0')
Layer block.6.mlp.fc2_lif firing rates: tensor([1.6276e-04, 4.0975e-01, 3.5807e-01, 4.9528e-01], device='cuda:0')
Layer block.7.attn.q_lif firing rates: tensor([0.0805, 0.1413, 0.1159, 0.1617], device='cuda:0')
Layer block.7.attn.k_lif firing rates: tensor([0.0415, 0.0630, 0.0563, 0.0492], device='cuda:0')
Layer block.7.attn.v_lif firing rates: tensor([0.0320, 0.0430, 0.0350, 0.0347], device='cuda:0')
Layer block.7.attn.attn_lif firing rates: tensor([0.0761, 0.1580, 0.1574, 0.1635], device='cuda:0')
Layer block.7.attn.talking_heads_lif firing rates: tensor([0.2587, 0.3355, 0.3397, 0.3472], device='cuda:0')
Layer block.7.attn.shortcut_lif firing rates: tensor([0.1975, 0.2342, 0.2243, 0.2264], device='cuda:0')
Layer block.7.mlp.fc1_lif firing rates: tensor([0.0201, 0.0218, 0.0225, 0.0231], device='cuda:0')
Layer block.7.mlp.fc2_lif firing rates: tensor([0.0233, 0.0264, 0.0284, 0.0292], device='cuda:0')
Layer head_lif firing rates: tensor([0.0709, 0.1024, 0.1151, 0.1132], device='cuda:0')`
Therefore the average result at T=4 is:
q_lif: 0.1580
k_lif: 0.1303
v_lif: 0.1209
attn_lif: 0.1334
Now I am having trouble finding where the bug is, can you provided some guidance on computing and reproducing the results? Many thanks!