Fix single-machine cross-process registrations#69
Fix single-machine cross-process registrations#69alberthli wants to merge 12 commits intorai-opensource:mainfrom
Conversation
…o fix-task-registration
pculbertson
left a comment
There was a problem hiding this comment.
This one needs to cook a bit longer, per slack discussion
| return | ||
| try: | ||
| data = json.loads(_OVERRIDES_PATH.read_text()) | ||
| except Exception: |
There was a problem hiding this comment.
style: per the Google guide, I don't like catch-all except blocks. I also try to print or log the exception somewhere if I'm suppressing it.
There was a problem hiding this comment.
|
|
||
|
|
||
| def get_run_id(env_var: str = "JUDO_TASK_RUN_ID") -> str: | ||
| """Return a per-process run-ID and whether it was freshly created.""" |
There was a problem hiding this comment.
nit: this docstring seems stale
| _registered_tasks[name] = (task_cls, cfg_cls) | ||
|
|
||
|
|
||
| def _persist_ephemeral_registry() -> None: |
There was a problem hiding this comment.
nit: persist doesn't really capture what this function is doing. maybe sync or update?
| def register_task(name: str, task_type: Type[Task], task_config_type: Type[TaskConfig]) -> None: | ||
| """Registers a new task.""" | ||
| """Register a new task with the Judo framework for this run only.""" | ||
| _registered_tasks[name] = (task_type, task_config_type) |
There was a problem hiding this comment.
Should there be a _load_ephemeral_registry call here? Otherwise registry entries only get read once from file whenever __init__ is imported (per thread?) right?
There was a problem hiding this comment.
Alternatively, you could change the logic of _persist_ephemeral_registry() to first read in what's currently in the file, merge it with what's in _registered_tasks in this thread + write it back?
| directory.mkdir(parents=True, exist_ok=True) | ||
| path = directory / f"{prefix}_{run_id}.json" | ||
|
|
||
| # remove stale siblings |
There was a problem hiding this comment.
Maybe you should print a warning that you're doing this? I can't imagine why someone would do this, but if they happened to write an unrelated file {prefix}_{something_else}.json it's going to get deleted here. That'd be a hell of a thing to debug lol
| return | ||
| try: | ||
| data = json.loads(_CUSTOM_REGISTRY_PATH.read_text()) | ||
| except Exception: |
There was a problem hiding this comment.
same nit re: bare exceptions. not sure this should be suppressed if it errors...
|
|
||
| for name, info in data.items(): | ||
| # load Task class | ||
| task_mod = info.get("task_mod") |
There was a problem hiding this comment.
This will always be None, right? I don't see task_mod anywhere else in the codebase - is it vestigial?
This PR addresses broken CLI-based custom registrations on a single machine for the following. This addresses #68.
Previously, these overrides were only loosely tested in a single process, but the propagation to other processes was not investigated (we only investigated the hydra-based solution deeply). This PR does the following to resolve this issue:
This PR currently does NOT solve the following, but we will want to soon, especially for mobile robots:
This simple test case passes for me currently:
To test on other custom code, do the following:
pip uninstall judo-rai # uninstall any previous versions in your virtual env pip install git+https://github.com/alberthli/judo.git@e2ebc4c