Hello BICLab Team,
I am currently examining the implementation of the SpikingBrain-7B model, specifically focusing on the activation function used in the V1-7B-sft-s3-reasoning variant during inference.
Upon reviewing the code in hf_7B_model/modeling_gla_swa.py, I noticed the following key details:
- The GLU class (lines 31-47) initializes hidden_act with a default value of swish, and uses ACT2FN[hidden_act] (line 41) to set the activation function—this resolves to the swish activation imported from fla.modules.activations.
- The GLAswaConfig in the same directory (hf_7B_model/configuration_gla_swa.py) also sets hidden_act="swish" as the default configuration (line 45).
However, the technical paper associated with SpikingBrain-7B specifies that an adaptive-threshold neuron should be used as the activation function, rather than swish.
I would like to confirm whether my understanding is correct: the V1-7B-sft-s3-reasoning model actually uses the swish activation function during inference, rather than the adaptive-threshold neuron described in the paper? If there is a misunderstanding on my part, please clarify the correct implementation of the activation function.
Thank you very much for your time and support! Looking forward to your response.
Hello BICLab Team,
I am currently examining the implementation of the SpikingBrain-7B model, specifically focusing on the activation function used in the V1-7B-sft-s3-reasoning variant during inference.
Upon reviewing the code in
hf_7B_model/modeling_gla_swa.py, I noticed the following key details:However, the technical paper associated with SpikingBrain-7B specifies that an adaptive-threshold neuron should be used as the activation function, rather than swish.
I would like to confirm whether my understanding is correct: the V1-7B-sft-s3-reasoning model actually uses the swish activation function during inference, rather than the adaptive-threshold neuron described in the paper? If there is a misunderstanding on my part, please clarify the correct implementation of the activation function.
Thank you very much for your time and support! Looking forward to your response.