Imitation learning with dagger#906

Open

akashvelu wants to merge 390 commits intomasterfrom

Contributor

akashvelu commented Apr 13, 2020

Pull request information

Status: ready to merge
Kind of changes: new feature
Related PR or issue: ? (optional)

Description

Adds functionality to do imitation learning (with DAgger), to train a model to imitate an expert.

akashvelu requested review from AboudyKreidieh, cathywu, eugenevinitsky and kanaadp as code owners

April 13, 2020 01:41

akashvelu assigned eugenevinitsky and akashvelu

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/.idea/dagger.iml Outdated

Comment on lines 1 to 12

+              <?xml version="1.0" encoding="UTF-8"?>
+              <module type="PYTHON_MODULE" version="4">
+                <component name="NewModuleRootManager">
+                  <content url="file://$MODULE_DIR$" />
+                  <orderEntry type="jdk" jdkName="Python 3.6 (flow)" jdkType="Python SDK" />
+                  <orderEntry type="sourceFolder" forTests="false" />
+                </component>
+                <component name="PyDocumentationSettings">
+                  <option name="format" value="PLAIN" />
+                  <option name="myDocStringFormat" value="Plain" />
+                </component>
+              </module>

                
                    No newline at end of file

Member

eugenevinitsky Apr 13, 2020

Nit: please remove this file.

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/Untitled.ipynb Outdated

Comment on lines 1 to 5

+              {
+               "cells": [
+                {
+                 "cell_type": "code",
+                 "execution_count": 1,

Member

eugenevinitsky Apr 23, 2020

Did you mean to commit this file?

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/i210_multiagent.py Outdated

Comment on lines 1 to 8

+              """Multi-agent I-210 example.
+              Trains a non-constant number of agents, all sharing the same policy, on the
+              highway with ramps network.
+              """
+              import os
+              import numpy as np
+              from ray.tune.registry import register_env

Member

eugenevinitsky Apr 23, 2020

This file seems identical to existing code?

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_controller.py Outdated

+                  """
+                  # Implementation in Tensorflow
+                  def __init__(self, veh_id, action_network, multiagent, car_following_params=None, time_delay=0.0, noise=0, fail_safe=None):

Member

eugenevinitsky Apr 23, 2020

Please add a docstring so we can know what action_network is.

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

Comment on lines 27 to 28

		with tf.variable_scope(policy_scope, reuse=tf.AUTO_REUSE):
		self.build_network()

Member

eugenevinitsky Apr 23, 2020

Why do you need an AUTO_REUSE here?

Contributor Author

akashvelu May 4, 2020

I put an AUTO_REUSE here so that the same variables will be reused when the graph is rerun (so copies of the variables (weights/biases) don't get recreated)

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

+                      self.action_predictions = pred_action
+                      print("TYPE: ", type(self.obs_placeholder))
+                      if self.inject_noise == 1:

Member

eugenevinitsky Apr 23, 2020

Nit: conventionally you don't need to check a bool like this

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

+                      Defines input, output, and training placeholders for neural net
+                      """
+                      self.obs_placeholder = tf.placeholder(shape=[None, self.obs_dim], name="obs", dtype=tf.float32)
+                      self.action_placeholder = tf.placeholder(shape=[None, self.action_dim], name="action", dtype=tf.float32)

Member

eugenevinitsky Apr 23, 2020

So for stochastic algorithms, they are parametrized by a mean and standard deviation of a gaussian that you sample from. It'd be cool to add this as an option here so we can use PPO

Member

eugenevinitsky Apr 23, 2020

This current implementation can be used for deterministic algorithms like DDPG and TD3 which is great

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

Comment on lines 80 to 81

		if len(observation.shape)<=1:
		observation = observation[None]

Member

eugenevinitsky Apr 23, 2020

Good check!

eugenevinitsky reviewed

View reviewed changes

flow/controllers/imitation_learning/imitating_network.py Outdated

+                      # network expects an array of arrays (matrix); if single observation (no batch), convert to array of arrays
+                      if len(observation.shape)<=1:
+                          observation = observation[None]
+                      ret_val = self.sess.run([self.action_predictions], feed_dict={self.obs_placeholder: observation})[0]

Member

eugenevinitsky Apr 23, 2020

You should make it clear here that this is returning 1 accel and will not operate correctly if you pass a batch

brentgryffindor and others added 15 commits

May 25, 2020 16:58


          datapip pipeline implemented

c373e94


          get up to date with i210_dev

a88c209


          remove dupe imports

89f8d1d


          remove blank lines after docstrings

306a01f


          add back ray import

0d5fa6b


          remove whitespace

0ade197


          moved imports under functions in train.py (#903)

1111e9a

* deleting unworking params from SumoChangeLaneParams

* deleted unworking params, sublane working in highway
:

* moved imports inside functions

* Apply suggestions from code review

* bug fixes

* bug fix

Co-authored-by: Aboudy Kreidieh <akreidieh@gmail.com>


          get not departed vehicles (#922)

a4c7d67

* added function to kernel/vehicle to get number of not departed vehiles

* fixed over indentation of the docstring

* indentation edit

* pep8

Co-authored-by: AboudyKreidieh <akreidieh@gmail.com>


          changed _departed_ids, and _arrived_ids in the update function (#926)

36e8851

* changed _departed_ids, and _arrived_ids in the update function

* fixed bug in get_departed_ids and get_arrived_ids


          Add an on ramp option

ebb2921


          Increased inflows to 10800 to match density in Bennis ring

e4c02bb


          Upgrade the network to not have keepclear value on the junctions

505d646


          Add 1 lane highway network for Benni

7d52445


          multiple runs issue solved, testing added

c3b2a51


          added more support for lambda function

dc881e0

liljonnystyle and others added 30 commits

July 1, 2020 18:58


          fix rectangle positioning for both networks

e22189e


          Merge pull request #989 from flow-project/jl-tsd-mask

06ff2d9

Time-Space Diagram greyed regions


          Reward options in I210-dev

c830f78

Add accel penalty, stop penalty, mpg reward, and ability to compute reward for any vehicles upstream of you (i.e. make you less greedy and more social)


          fix pydocstyle

cb74b8c


          add docstring

eb2416b


          remove excess whitespace

6ed00e3


          only call get_configuration() if to_aws

b80e563


          Energy class for inventorying multiple energy models (#944)

7c9a48a

* New energy class to inventory multiple energy models

Co-authored-by: Joy Carpio <engrjoycarpio@berkeley.edu>


          Time-Space Diagrams automatically to S3 (#993)

5b7e8b2

* Add time-space diagram plotting to experiment.py


          Query Prereq Check (#987)

c4ba7ad

* prereq dict added to query

* prereq checking mechanism implemented, not tested yet

* prereq checking tested

* change to more flexible filter handling

* make safety_rate and safety_max_value floats

* ignore nulls in fact_top_scores

* fix typo

* remove unneeded import

* replace uneccessary use of list to set

* add queries to pre-bin histogram data

* fix the serialization issue with set, convert to list before write as json

* fix query

* fix query

* fixed query bug

Co-authored-by: liljonnystyle <jonny5@berkeley.edu>


          remove extra whitespace

bb1f4f5


          whitespace linting

9f1a834


          Update energy query with new power demand model (#996)

220994e

* update tacoma power demand query, meters/Joules -> mpg conversion


          Power-Demand Model fix (#995)

f1ded54

* fix some implementation errors in energy models

* pull i210_dev and fix flake8


          convert tacoma fc to gallons per hour

f63cc37


          comment on road grade; exception handling on unpickling

c2836e8


          Add learning rate as a parameter, override import_from_h5 method usin…

29eb5a0

…g setattr


          add --multi_node flag

97333cf


          Merge pull request #998 from flow-project/i210_add_multinode

f7e1d78

Add --multi_node flag


          Ak/i210 master merge (#994)

3ac508a

* implement HighwayNetwork for Time-Space Diagrams (#979)

* fixed h-baselines bug (#982)

* Replicated changes in 867. Done bug (#980)

* Aimsun changes minus reset

* removed crash attribute

* tensorflow 1.15.2

* merge custom output and failsafes to master (#981)

* add write_to_csv() function to master

* include pipeline README.md

* add data pipeline __init__

* add experiment.py changes

* add write_to_csv() function to master

* change warning print to ValueError message

* update to new update_accel methods

* add display_warnings boolean

* add get_next_speed() function to base vehicle class

* revert addition of get_next_speed

* merge custom output and failsafes to master

* add write_to_csv() function to master

* add display_warnings boolean

* add get_next_speed() function to base vehicle class

* revert addition of get_next_speed

* revert change to get_feasible_action call signature

* change print syntax to be python3.5 compliant

* add tests for new failsafe features

* smooth default to True

* rearrange raise exception for test coverage

* moved simulation logging to the simulation kernel (#991)

* add 210 edgestarts for backwards compatibility (#985)

* fastforward PR 989

* fix typo

* Requirements update (#963)

* updated requirements.txt and environment.yml

* Visualizer tests fixes

* remove .func

* move all miles_per_* rewards to instantaneous_mpg

* update reward fns to new get_accel() method

* made tests faster

* some fixes to utils

* change the column order, modify the pipeline to use SUMO emission file

* write metadata to csv

* change apply_acceleration smoothness setting

* make save_csv return the file paths

Co-authored-by: AboudyKreidieh <akreidieh@gmail.com>
Co-authored-by: liljonnystyle <jonny5@berkeley.edu>
Co-authored-by: Kathy Jang <kathyjang@gmail.com>
Co-authored-by: Nathan Lichtlé <nathanlct@icloud.com>
Co-authored-by: akashvelu <akashvelu@berkeley.edu>
Co-authored-by: Brent Zhao <brentgryffindor@outlook.com>


          remove line from testing

bb94c27


          fix toyota temp file removal

d373965


          fix fc <> power unit conversion

ab6732e


          make default highway single penetration rate 0

c0de59b


          use 1609.34 meters per mile

5f6acc2


          fix av routing controller if no on-ramp

7a773e3


          Time-Space Diagram offset axes (#999)

0e8be95

* refactor tsd to allow for axes offsets

* update time-space plotter unit tests


          Move imitation to algorithms folder

6c68800


          Merge i210 into branch

d73612f


          Revert model architecture and # rollouts to previous defaults

6aca7c5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

eugenevinitsky eugenevinitsky left review comments

AboudyKreidieh Awaiting requested review from AboudyKreidieh AboudyKreidieh is a code owner

cathywu Awaiting requested review from cathywu

kanaadp Awaiting requested review from kanaadp

At least 1 approving review is required to merge this pull request.

Labels

None yet