Additional Methods of Using Monitor In Training And Testing (Non-CPO)#1
Open
DBay-ani wants to merge 7 commits intor-pad:masterfrom
Conversation
ways (excluding use of CPO), In particular, the monitor is worked into the environment and used to provide: *activation of a safety fallback policy *modification of the reward to encorporate the monitor signal *provide additional features to the observations received by the controller.
a108c08 to
99b1ffe
Compare
…rollerInTrainingTesting_andRewardEncorporation
…e training in envs/monitorEncorporated_env.py train/train_monitorEncorporated_straight_planner.py . It is still very much a toy, but it matches the non-toy CPO safety constraints Edward is using . Also, made some trivial adjustments in the envs/monitorEncorporated_env.py to allow the quantitative monitor subformulas to the action - this leverages 95% of infulstructure already there, a very trivial change. I think I took it out before intentionally since I thought in matched the monitor use-cases better.... we might be abusing terminology to call this stuff a monitor - maybe...
…hat was noted in the previous commit log for train/train_monitorEncorporated_straight_planner.py , the fallback controller and quantitative monitor used are toy-ish, but they match the non-toy work Edward is performing with CPO.
…s a bit at a time.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Committing code to use the feedback from safety monitors in a variety of
ways (excluding use of CPO), In particular, the monitor is worked into
the environment and used to provide:
*activation of a safety fallback policy
*modification of the reward to encorporate the monitor signal
*provide additional features to the observations received by the
controller.