I am learning RL on prototwin.
I think this line, reward -= self.reward_position_penalty() * reward * 0.5 should be reward -= self.reward_position_penalty() * 0.5, removing the reward .
Normal reward function for cartpole will look like:
reward = (position_weight * position_reward + angle_weight * angle_penalty + force_weight * force_penalty)
If that is not the case, can anyone explain why reward was added like this: reward -= self.reward_position_penalty() * reward * 0.5?