Skip to content

Adding multidiscrete feature#328

Open
Jogima-cyber wants to merge 3 commits intoDeNA:masterfrom
Jogima-cyber:master
Open

Adding multidiscrete feature#328
Jogima-cyber wants to merge 3 commits intoDeNA:masterfrom
Jogima-cyber:master

Conversation

@Jogima-cyber
Copy link
Copy Markdown

Hello there ! I'd like to add the capability to handle multidiscrete action space. Here is how I would do that :

  • Add multidiscrete to env config yaml
  • if multidiscrete is set to True then user must add nvec attribute to his model. nvec format is [nb. of actions for first independent action set, nb. of actions for second independent action set, ect.]
  • If multidiscrete set to True, then output logits must be treated differently, I'll take ppo_multidiscrete as inspiration, softmax must be applied for each independent set of actions

@YuriCat
Copy link
Copy Markdown
Contributor

YuriCat commented Nov 7, 2022

Hi, @Jogima-cyber ! Long time no see, and thanks for your great suggestion!
I also tried the multiple action situation this year, so I'll check if it can be put in the main code.

@Jogima-cyber
Copy link
Copy Markdown
Author

I'm currently working on it ! First thing I'm doing right now is integrating a home made simple multidiscrete environment into HandyRL in order to test my upcoming multidicrete integration. I'll push it as soon as I'm finished (it's gonna include a non-multidiscrete option).
I'm not sur I'll need to add a multidiscrete variable into config, maybe just nvec being defined in model should be enough for the library to know it should handle a multidiscrete case.
I need this feature right now, so I figured I could add it globally instead on just my fork, if you want it though.

@YuriCat
Copy link
Copy Markdown
Contributor

YuriCat commented Nov 8, 2022

@Jogima-cyber
I pushed feature/multi_unit branch into my fork.
master...YuriCat:HandyRL:feature/multi_unit

In this branch, each agent can output env.num_units() actions in each turn.
I've confirmed that it works with TicTacToe!

@Jogima-cyber
Copy link
Copy Markdown
Author

Jogima-cyber commented Nov 8, 2022

Thank you @YuriCat, that's almost exactly what I needed. I'm working on your fork, I just need to add a moment "unit_mask", because some of my independent action sets may be empty depending on the state (meaning no action can be taken among the actions of the action set).

@YuriCat
Copy link
Copy Markdown
Contributor

YuriCat commented Nov 8, 2022

@Jogima-cyber

unit_mask

Indeed! We need unit_mask if there are absent units.

@Jogima-cyber
Copy link
Copy Markdown
Author

Btw. I've output the shapes of some vectors and in line feature/multi_unit/train.py:l204 I've got shapes that I don't think are normal :

  • for total_advantages I got torch.Size([1, 32, 1, 2304, 2304])
  • meaning that for (-log_selected_policies * total_advantages) I got torch.Size([1, 32, 1, 2304, 2304])
    Where 32 is my BS and 2304 is my unit number.
    I'm investigating to understand more parts of your code in order to know if it's normal behaviour or not.

@YuriCat
Copy link
Copy Markdown
Contributor

YuriCat commented Nov 10, 2022

@Jogima-cyber
.unsqueeze(-1) may be necessary for either of the two.

@Jogima-cyber
Copy link
Copy Markdown
Author

Jogima-cyber commented Nov 10, 2022

Yeah I found where to add a .unsqueeze(-1) : feature/multi_unit/train.py:l234 we have log_rhos = log_selected_t_policies.detach() - log_selected_b_policies but log_selected_t_policies is of shape (1, 32, 1, 2304, 1) but log_selected_b_policies is of shape (1, 32, 1, 1, 2304) and therefore torch is applying broadcasting. I've solved the issue by adding .unsqueeze(-1) and it should work for my case but I didn't take into account all the cases HandyRL is covering like RNN, turn based training and so on, so I don't think my fixing apply generally for HandyRL.

@YuriCat
Copy link
Copy Markdown
Contributor

YuriCat commented Nov 18, 2022

@Jogima-cyber Updated my branch feature/numti_unit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants