Dear authors,
Thank you very much for your outstanding work and for making the code and resources publicly available.
I am currently working on evaluating the energy consumption of a Spiking Transformer-based model.
I would greatly appreciate it if you could help me verify the correctness of my current approach.
-
Pairing of firing rate and FLOPs.
For the MS_SPS module, I assigned the firing rates of the five convolutional layers as follows:
1.0, MS_SPS_lif, MS_SPS_lif1, MS_SPS_lif2, and MS_SPS_lif3.
In the self-attention component of MS_SSA, I calculated the firing rate for Q, K, and V as the average:
(MS_SSA_ConvN_q_lif + MS_SSA_ConvN_k_lif + MS_SSA_ConvN_v_lif) / 3,
and the rate for the attention function f(Q, K, V) as:
(MS_SSA_ConvN_q_lif + MS_SSA_ConvN_k_lif).
For the linear projection following the attention, I used:
MS_SSA_ConvN_x_after_qkv.
Regarding the MLP module, I used:
MS_MLP_ConvN_fc1_lif and MS_MLP_ConvN_fc2_lif
for the first and second layers, respectively.
Could you kindly let me know if there are any mistakes or incorrect assumptions in this methodology?
-
Firing rate tendencies.
While analyzing the output of firing_num.py, I noticed something unexpected:
In my case, the firing rate of Q was more than 10 times higher than that of K and V.
However, the paper suggests that V usually has the highest firing rate.
This suggests that my current observations show the opposite trend.
I used the command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port 29501 firing_num.py -c conf/imagenet/8_512_300E_t4.yml --model sdt --spike-mode lif --resume 8_512.pth.tar --no-resume-opt
Would you happen to have any insights or advice regarding this discrepancy?
If you have time to review it, I would be sincerely grateful for your guidance.
Thank you again for your valuable work and your time.
Dear authors,
Thank you very much for your outstanding work and for making the code and resources publicly available.
I am currently working on evaluating the energy consumption of a Spiking Transformer-based model.
I would greatly appreciate it if you could help me verify the correctness of my current approach.
Pairing of firing rate and FLOPs.
For the MS_SPS module, I assigned the firing rates of the five convolutional layers as follows:
1.0, MS_SPS_lif, MS_SPS_lif1, MS_SPS_lif2, and MS_SPS_lif3.In the self-attention component of MS_SSA, I calculated the firing rate for Q, K, and V as the average:
(MS_SSA_ConvN_q_lif + MS_SSA_ConvN_k_lif + MS_SSA_ConvN_v_lif) / 3,and the rate for the attention function f(Q, K, V) as:
(MS_SSA_ConvN_q_lif + MS_SSA_ConvN_k_lif).For the linear projection following the attention, I used:
MS_SSA_ConvN_x_after_qkv.Regarding the MLP module, I used:
MS_MLP_ConvN_fc1_lif and MS_MLP_ConvN_fc2_liffor the first and second layers, respectively.
Could you kindly let me know if there are any mistakes or incorrect assumptions in this methodology?
Firing rate tendencies.
While analyzing the output of
firing_num.py, I noticed something unexpected:In my case, the firing rate of Q was more than 10 times higher than that of K and V.
However, the paper suggests that V usually has the highest firing rate.
This suggests that my current observations show the opposite trend.
I used the command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port 29501 firing_num.py -c conf/imagenet/8_512_300E_t4.yml --model sdt --spike-mode lif --resume 8_512.pth.tar --no-resume-optWould you happen to have any insights or advice regarding this discrepancy?
If you have time to review it, I would be sincerely grateful for your guidance.
Thank you again for your valuable work and your time.