feat(bug-finding): warn users if resource config conflicts with machi…#159
feat(bug-finding): warn users if resource config conflicts with machi…#159tkqdldk wants to merge 8 commits into
Conversation
azchin
left a comment
There was a problem hiding this comment.
Thanks for the PR and sorry for the late response! I have a few requests detailed below. We also use a few linters and formatters in our CI workflow, so I recommend running them manually before pushing.
| except(OSError): | ||
| pass | ||
|
|
||
| memeinfo = Path("/proc/meminfo") |
There was a problem hiding this comment.
meminfo typo (caught by ruff)
| machine_cpu_count = os.cpu_count() | ||
| machine_memory = get_host_memory() | ||
|
|
||
| if machine_cpu_count is None or machine_memory == 0: |
There was a problem hiding this comment.
(C1) Let's get machine_memory to return None on failure
| except subprocess.CalledProcessError as e: | ||
| raise RuntimeError(f"Error removing {path} with Docker: {e}") | ||
|
|
||
| def get_host_memory() -> int: |
There was a problem hiding this comment.
Return val as int | None, see (C1)
| for line in meminfo.read_text().splitlines(): | ||
| if line.startswith("MemTotal:"): | ||
| return int(line.split()[1])*1024 | ||
| return 0 |
| except ValueError as e: | ||
| log_warning(f"Failed to parse memory for {name}: {e}") | ||
|
|
||
| cpu_check = max_cpus_required <= machine_cpu_count |
There was a problem hiding this comment.
I think this is an off-by-one, since os.cpu_count() returns number of CPUs but cpuset is indexed by zero, so we'd get range of 0-3 on 4-core machine. max([0, 1, 2, 3]) == 3 < 4
max_cpus_required < machine_cpu_count would be the correct check. But please check/test for this edge case dynamically.
There was a problem hiding this comment.
You're absolutely right ^^'... I will also add dynamic testing.
867fac0 to
c55125d
Compare
|
Hm I'm not seeing the error messages implemented in your PR in my output. The build does indeed abort, but that's because of Docker's error handling and happens after I can take a deeper look at what we're missing after the weekend, if needed. |
Hi, sorry for the late response, so the validation only runs before run currently, as I understood that would be the only need. I'll extend the check to build-target so the warning shows here too. |
|
Hi, sorry for the delayed feedback as we were working on a big internal refactor. I'll take a proper look this weekend, in the meanwhile could you check the CI failure? In |
c55125d to
dca9899
Compare
|
No worries ^^ Resolved the CI failure on my part and added the check in the build-target step. Let me know if I need to add or adjust anything once you take a look. |
azchin
left a comment
There was a problem hiding this comment.
I'd like the check to be placed earlier, and error + terminate.
| resolved_source_path = source_dir | ||
|
|
||
| tasks = [] | ||
| tasks: list[tuple[str, Callable[[MultiTaskProgress], TaskResult]]] = [ |
There was a problem hiding this comment.
Could you call _validate_machine_resources near the start of build(), before the target.build_docker_image() gets run? Don't need to adhere to the tasks loop if calling it directly makes it easier.
| memory_check = total_memory_required <= machine_memory | ||
|
|
||
| if not cpu_check or not memory_check: | ||
| log_warning( |
There was a problem hiding this comment.
Let's produce an error instead of warning, and terminate early. Current behavior is I see the warning message, but the docker compose still tries to run.
|
CI failed b/c of LLM API key not being passed correctly, I'll look into that. |
5aa53b2 to
38db6c4
Compare
|
Hi, sorry for the delay, I was very busy lately. Added the changes according to your comments. |
|
Looks good to me! There's an integration test failure because it looks like we're overallocating memory somehow.
+++ b/oss_crs/tests/integration/test_builder_sidecar_full.py
@@ -33,11 +33,11 @@ def sidecar_full_compose(tmp_dir):
content = {
"run_env": "local",
"docker_registry": "local",
- "oss_crs_infra": {"cpuset": "0-3", "memory": "8G"},
+ "oss_crs_infra": {"cpuset": "0-3", "memory": "7G"},
"builder-sidecar-full": {
"source": {"local_path": str(CRS_FIXTURE)},
"cpuset": "0-3",
- "memory": "8G",
+ "memory": "7G",
},
}
path = tmp_dir / "compose.yaml"
diff --git a/oss_crs/tests/integration/test_builder_sidecar_lite.py b/oss_crs/tests/integration/test_builder_sidecar_lite.py
index 9c71c17..237923b 100644
--- a/oss_crs/tests/integration/test_builder_sidecar_lite.py
+++ b/oss_crs/tests/integration/test_builder_sidecar_lite.py
@@ -36,11 +36,11 @@ def sidecar_lite_compose(tmp_dir):
content = {
"run_env": "local",
"docker_registry": "local",
- "oss_crs_infra": {"cpuset": "0-3", "memory": "8G"},
+ "oss_crs_infra": {"cpuset": "0-3", "memory": "7G"},
"builder-sidecar-lite": {
"source": {"local_path": str(CRS_FIXTURE)},
"cpuset": "0-3",
- "memory": "8G",
+ "memory": "7G",
},
}
path = tmp_dir / "compose.yaml" |
|
Hello @tkqdldk , are you going to continue working on this PR? If you no longer have interest or bandwidth, I'll close the PR and integrate your contributions in another branch + PR so that I can make some modifications. |
Hi, sorry again for the delay. It seemed to be a rounding error from integer division the last time I checked. I can still fix it if necessary or if you choose so. |
|
You can go ahead and fix it. Just wanted to check up on this |
|
Could you check the pyright error? I'll merge as soon as the CI passes, your code looks good. |
|
Hi, I looked through the pyright error outputs and none seem related to the resource check nor the error/termination which follow the failing of the check. |
|
Weird, upstream doesn't have pyright errors. I'll merge main into your branch |
| for name, crs_cfg in self.config.crs_entries.items() | ||
| ] | ||
|
|
||
| builder_count = sum(1 for crs in self.crs_list if crs.config.is_builder) |
There was a problem hiding this comment.
I think pyright is complaining about this line? is_builder
…ne resources Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
… memory output Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
… to integer division Signed-off-by: tkqdldk <samia.20.bouchaal@gmail.com>
76aa1e2 to
44f6826
Compare
Closes #49
Description
Adds resources check and a warning when the compose file resource configurations (max CPU and total memory) exceed machine resources. The check runs in "Validate Configuration for Running" phase, before Docker attempts to start containers.
The function emits a warning but does not stop the execution.
Changes
get_host_total_memory()inutils.py_validate_machine_resources()incrs_compose.py__validate_before_runValidation
All existing tests pass.
P.S. : Can't assign a reviewer directly, cc @azchin for review