This module implements a low-regret learning algorithm for an agent that seeks to minize a cost by making a sequence of decisions in response to a stream of measurements.
The decision made by the agent at time t consists of selecting an action a(t) among a set action_set to minimize
a cost J(a(t),w(t)) that depend on the action a(t) as well as on a latent variable w(t) that is unknown to the agent.
To decide on the value of the action a(t), the agent has available a stream of measurements y(t), each taking values
in a set measurement_set.
The relationships between actions, measurements, and latent variables are unknown and need to be estimated from data.
To this effect, we assume that for some time instants t, we learn “after-the-fact” the value of the cost J(a,w(t))
associated with all possible actions in action_set.
We are especially interested in scenarios where the value of the cost J(a(t),w(t)) may be determined by other agents,
whose interests are unknown to us and that may be reacting to past actions. In this context, the latent variable w(t)
will include the internal states of such agents, which could be correlated across time and also correlated to past actions.
src/: python moduletest/: unit testing scriptsdoc/: documentationexamples/: examples
- Switch to the LearningGames directory such that setup.py is in the current directory.
- Either in a virtual environment (e.g. conda or pipenv) (preferred) or using a base python distribution. In order to
- examples including rock-paper-scissors, run:
for editable mode (i.e. development). For a normal install run
pip install -e .pip install . - [Optional] In order to run the EMBER example (
ember_malware_classification.py), it's necessary to download the dataset first:- Go to the EMBER Github repo
- Download the 2018 dataset by clicking the associated link.
- It is necessary to install their package either by using Docker or installing per their README
- For usability, it is easier to move the data to pickled objects.
Assuming the 2018 data you downloaded is in
data/ember2018/, and the target location to store the pickled data:this_project/data/. The data will total ~10GB.
import ember import pickle # load the data using ember (sits on top of Lief) X_train, y_train, X_test, y_test = ember.read_vectorized_features("/data/ember2018/") # write to a dict for easy iteration data = dict(x_train=X_train, y_train=y_train, x_test=X_test, y_test=y_test) for key, val in data.items(): # store data in pkl files individually (easier to chunk out as files are large for X_) with open(f'this_project/data/{key}.pkl', 'wb') as f: pickle.dump(val, file=f) - Go to the EMBER Github repo
- [Optional] Run the unit testing scripts from the shell using
python test/testLearningGames.py
The following examples can be found in the example/ folder:
-
rps_vs_bad_rng.ipynb: a minimal example. learn optimal policy for the Rock-Paper-Scissors game, against an opponent that uses a randomized policy with a bad generator of random numbers -
RPS_vs_fixed.ipynb: learn optimal policy for theRock-Paper-Scissorsgame, against an opponent that uses a fixed policy -
RPS_vs_round_robin.ipynb: learn optimal policy for theRock-Paper-Scissorsgame, against an opponent that cycles between actions in a deterministic fashion -
RPS_vs_bad_rng.ipynb: learn optimal policy for theRock-Paper-Scissorsgame, against an opponent that uses a randomized policy with a bad generator of random numbers -
RPS_selfplay.ipynb: learn optimal policy for theRock-Paper-Scissorsgame, against an opponent that uses the same learning algorithm -
rps_vs_bad_rng_nonstationary.py: considers a setting similar to the former, except that the opponent switches random number generators partway through the game. Compares against relevant alternative strategies, which can take several minutes to run. As such, results are persisted so that visualization can be done separately.rps_plot.py: Visualize results by running with appropriate file path of results.
-
malware_classification.py: run the EMBER malware classification example with goal of minimizing weighted sum of classification error (false positives and false negatives). Compares against relevant benchmarks that are generally slow, so it persists results to pkl files. Note the extra installation required to get the dataset.ember_plot.py: Visualize results by running with file path of results
Joao Hespanha (hespanha@ece.ucsb.edu)
http://www.ece.ucsb.edu/~hespanha
Sean Anderson (seananderson@ucsb.edu)
University of California, Santa Barbara
Copyright (C) 2023 Joao Hespanha, Univ. of California, Santa Barbara