Adding multidiscrete feature by Jogima-cyber · Pull Request #328 · DeNA/HandyRL

Jogima-cyber · 2022-11-06T14:46:16Z

Hello there ! I'd like to add the capability to handle multidiscrete action space. Here is how I would do that :

Add multidiscrete to env config yaml
if multidiscrete is set to True then user must add nvec attribute to his model. nvec format is [nb. of actions for first independent action set, nb. of actions for second independent action set, ect.]
If multidiscrete set to True, then output logits must be treated differently, I'll take ppo_multidiscrete as inspiration, softmax must be applied for each independent set of actions

YuriCat · 2022-11-07T14:08:09Z

Hi, @Jogima-cyber ! Long time no see, and thanks for your great suggestion!
I also tried the multiple action situation this year, so I'll check if it can be put in the main code.

Jogima-cyber · 2022-11-07T14:15:56Z

I'm currently working on it ! First thing I'm doing right now is integrating a home made simple multidiscrete environment into HandyRL in order to test my upcoming multidicrete integration. I'll push it as soon as I'm finished (it's gonna include a non-multidiscrete option).
I'm not sur I'll need to add a multidiscrete variable into config, maybe just nvec being defined in model should be enough for the library to know it should handle a multidiscrete case.
I need this feature right now, so I figured I could add it globally instead on just my fork, if you want it though.

YuriCat · 2022-11-08T03:59:32Z

@Jogima-cyber
I pushed feature/multi_unit branch into my fork.
master...YuriCat:HandyRL:feature/multi_unit

In this branch, each agent can output env.num_units() actions in each turn.
I've confirmed that it works with TicTacToe!

Jogima-cyber · 2022-11-08T14:00:38Z

Thank you @YuriCat, that's almost exactly what I needed. I'm working on your fork, I just need to add a moment "unit_mask", because some of my independent action sets may be empty depending on the state (meaning no action can be taken among the actions of the action set).

YuriCat · 2022-11-08T17:14:11Z

@Jogima-cyber

unit_mask

Indeed! We need unit_mask if there are absent units.

Jogima-cyber · 2022-11-09T19:24:16Z

Btw. I've output the shapes of some vectors and in line feature/multi_unit/train.py:l204 I've got shapes that I don't think are normal :

for total_advantages I got torch.Size([1, 32, 1, 2304, 2304])
meaning that for (-log_selected_policies * total_advantages) I got torch.Size([1, 32, 1, 2304, 2304])
Where 32 is my BS and 2304 is my unit number.
I'm investigating to understand more parts of your code in order to know if it's normal behaviour or not.

YuriCat · 2022-11-10T02:53:36Z

@Jogima-cyber
.unsqueeze(-1) may be necessary for either of the two.

Jogima-cyber · 2022-11-10T10:57:31Z

Yeah I found where to add a .unsqueeze(-1) : feature/multi_unit/train.py:l234 we have log_rhos = log_selected_t_policies.detach() - log_selected_b_policies but log_selected_t_policies is of shape (1, 32, 1, 2304, 1) but log_selected_b_policies is of shape (1, 32, 1, 1, 2304) and therefore torch is applying broadcasting. I've solved the issue by adding .unsqueeze(-1) and it should work for my case but I didn't take into account all the cases HandyRL is covering like RNN, turn based training and so on, so I don't think my fixing apply generally for HandyRL.

YuriCat · 2022-11-18T07:56:20Z

@Jogima-cyber Updated my branch feature/numti_unit.

Adding multidiscrete config attribute

d88e80f

Jogima-cyber added 2 commits November 7, 2022 16:28

Adding multidiscrete home made toy env

d4333c9

Fixed typo

d8d61bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding multidiscrete feature#328

Adding multidiscrete feature#328
Jogima-cyber wants to merge 3 commits intoDeNA:masterfrom
Jogima-cyber:master

Jogima-cyber commented Nov 6, 2022

Uh oh!

YuriCat commented Nov 7, 2022

Uh oh!

Jogima-cyber commented Nov 7, 2022

Uh oh!

YuriCat commented Nov 8, 2022

Uh oh!

Jogima-cyber commented Nov 8, 2022 •

edited

Loading

Uh oh!

YuriCat commented Nov 8, 2022 •

edited

Loading

Uh oh!

Jogima-cyber commented Nov 9, 2022

Uh oh!

YuriCat commented Nov 10, 2022

Uh oh!

Jogima-cyber commented Nov 10, 2022 •

edited

Loading

Uh oh!

YuriCat commented Nov 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jogima-cyber commented Nov 6, 2022

Uh oh!

YuriCat commented Nov 7, 2022

Uh oh!

Jogima-cyber commented Nov 7, 2022

Uh oh!

YuriCat commented Nov 8, 2022

Uh oh!

Jogima-cyber commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuriCat commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jogima-cyber commented Nov 9, 2022

Uh oh!

YuriCat commented Nov 10, 2022

Uh oh!

Jogima-cyber commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuriCat commented Nov 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jogima-cyber commented Nov 8, 2022 •

edited

Loading

YuriCat commented Nov 8, 2022 •

edited

Loading

Jogima-cyber commented Nov 10, 2022 •

edited

Loading