diff --git a/.coverage b/.coverage new file mode 100644 index 000000000..50437b8b9 Binary files /dev/null and b/.coverage differ diff --git a/README.md b/README.md index b02699b7c..ef09e1ff6 100644 --- a/README.md +++ b/README.md @@ -76,8 +76,70 @@ python experiment-runner/ The results of the experiment will be stored in the directory `RunnerConfig.results_output_path/RunnerConfig.name` as defined by your config variables. +### Portability Across Users and Machines + +When sharing experiments across different users or machines, hardcoded paths in configuration files can cause issues. Experiment Runner supports **environment variables** to make your experiments portable without code changes: + +#### Available Environment Variables + +- **`EXPERIMENT_RUNNER_OUTPUT_PATH`**: Directory where experiment results are stored + - Default: `/experiments` + - Example: `export EXPERIMENT_RUNNER_OUTPUT_PATH="/path/to/results"` + +- **`ENERGIBRIDGE_PATH`**: Path to the EnergiBridge executable (for energy measurements) + - Default: `/usr/local/bin/energibridge` + - Example: `export ENERGIBRIDGE_PATH="/usr/local/bin/energibridge"` + +- **`EXAMPLES_PATH`**: Directory for generating new config templates + - Default: `/examples` + - Example: `export EXAMPLES_PATH="/home/user/my-experiments"` + +#### Using Environment Variables + +Set environment variables before running your experiment: + +```bash +export EXPERIMENT_RUNNER_OUTPUT_PATH="/data/experiments" +export ENERGIBRIDGE_PATH="/opt/energibridge/bin/energibridge" +python experiment-runner/ MyRunnerConfig.py +``` + +Your configuration files automatically use these variables if set, with sensible defaults when they are not. This allows the same experiment to run on different machines without any code modifications. + **More information about the profilers and use cases can be found in the [Wiki tab](https://github.com/S2-group/experiment-runner/wiki).** +--- +## Remote distribution + +Experiment Runner supports **distributed execution across multiple machines** using a master–worker architecture. + +### Architecture Overview + +- One machine acts as the **Master (Orchestrator)** + - Owns the experiment `run_table` + - Assigns runs to workers via a REST API + - Tracks progress and persists experiment state + - Triggers lifecycle events (e.g. `AFTER_EXPERIMENT`) when finished + +- Multiple machines act as **Workers** + - Request tasks from the master + - Execute runs locally using the configured experiment + - Submit results back to the master + +- Communication between master and workers is handled via a lightweight **Flask-based HTTP API** + +### How to run it +Start the orchestrator on the master machine: + ```bash +python experiment-runner/ examples// --distribute master +``` +On each worker machine, connect to the master: +```bash +experiment-runner/ examples// --distribute worker --master +``` +When the experiment finish it, the master would close automatically, the rest of the workers would need manually closing, they would close after 120s + + ## How to cite Experiment Runner If Experiment Runner is helping your research, consider to cite it as follows, thank you! diff --git a/Troubleshoating.md b/Troubleshoating.md new file mode 100644 index 000000000..e0cd4bc8b --- /dev/null +++ b/Troubleshoating.md @@ -0,0 +1,139 @@ +# Troubleshooting + +## 1. Python Package Installation Error + +When installing and setting up `experiment-runner`, one common issue is running: + +```bash +pip3 install -r requirments.txt +``` + +and getting the following error: + +```text +error: externally-managed-environment + +× This environment is externally managed +╰─> To install Python packages system-wide, try apt install + python3-xyz +``` + +Some Linux distributions (especially Ubuntu 24+, Debian, and Fedora) protect the system Python installation to avoid breaking system packages. + +### Solution + +Run: + +```bash +pip3 install -r requirments.txt --break-system-packages +``` + +### Alternative + +Use a Python virtual environment: + +```bash +python3 -m venv venv +source venv/bin/activate +pip install -r requirements.txt +``` + +--- + +## 2. EnergiBridge / JoularCore Permission Error + +When using EnergiBridge or JoularCore on Linux systems (especially AMD CPUs), you may encounter the following error when running the experiment: + +```text +thread 'main' (33575) panicked at src/cpu/amd.rs:20:76: +called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" } +note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace +``` + +The Rust profiler is trying to access low-level CPU energy counters (MSR / RAPL interfaces), but Linux blocks access for normal users. + +### Solution + +#### 1. Load the MSR Kernel Module + +Run: + +```bash +sudo modprobe msr +``` + +Then verify the device exists: + +```bash +ls /dev/cpu/0/msr +``` + +Expected output: + +```text +/dev/cpu/0/msr +``` + +If the file does not exist, the kernel module did not load correctly. + +--- + +#### 2. Check MSR Permissions + +Run: + +```bash +ls -l /dev/cpu/0/msr +``` + +If you see something similar to: + +```text +crw------- 1 root root +``` + +then only the root user can access the CPU energy counters. + +--- + +#### 3. Grant Read Permissions + +Run: + +```bash +sudo chmod o+r /dev/cpu/*/msr +``` + +This temporarily allows non-root users to read the MSR registers. + +--- + +#### If Nothing Works + +Some Linux systems completely block low-level profiling access. + +Run: + +```bash +cat /proc/sys/kernel/perf_event_paranoid +``` + +If the value is: + +```text +2 +3 +4 +``` + +then Linux is blocking low-level performance counters. + +#### Temporary Fix + +Run: + +```bash +echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid +``` + +This temporarily lowers the kernel restrictions and allows profiling tools to access hardware counters. diff --git a/examples/hello-world-fibonacci/README.md b/examples/hello-world-fibonacci/README.md index 33763119c..d83f45ad3 100644 --- a/examples/hello-world-fibonacci/README.md +++ b/examples/hello-world-fibonacci/README.md @@ -18,6 +18,6 @@ python experiment-runner/ examples/hello-world-fibonacci/RunnerConfig.py ## Results The results are generated in the `examples/hello-world-fibonacci/experiments` folder. - +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/hello-world-fibonacci/experiments` folder. **!!! WARNING !!!**: COLUMNS IN THE `energibridge.csv` FILES CAN BE DIFFERENT ACROSS MACHINES. ADJUST THE DATAFRAME COLUMN NAMES ACCORDINGLY. diff --git a/examples/hello-world-fibonacci/RunnerConfig.py b/examples/hello-world-fibonacci/RunnerConfig.py index 9c5dbc3a8..fb65490e3 100644 --- a/examples/hello-world-fibonacci/RunnerConfig.py +++ b/examples/hello-world-fibonacci/RunnerConfig.py @@ -5,139 +5,248 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) -from typing import Dict, List, Any, Optional +from typing import Dict, Any, Optional, List from pathlib import Path from os.path import dirname, realpath import os -import signal import pandas as pd import time import subprocess import shlex +import sys + class RunnerConfig: ROOT_DIR = Path(dirname(realpath(__file__))) - # ================================ USER SPECIFIC CONFIG ================================ - """The name of the experiment.""" - name: str = "new_runner_experiment" + name: str = "new_runner_experiment" + + default_output = ROOT_DIR / "experiments" + + results_output_path: Path = Path( + os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output)) + ) - """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the - results from this experiment. (Path does not need to exist - it will be created if necessary.) - Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + operation_type: OperationType = OperationType.AUTO - """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" - operation_type: OperationType = OperationType.AUTO + time_between_runs_in_ms: int = 1000 - """The time Experiment Runner will wait after a run completes. - This can be essential to accommodate for cooldown periods on some systems.""" - time_between_runs_in_ms: int = 1000 + ENERGIBRIDGE_PATH = "/home/andabarbu/.cargo/bin/energibridge" + + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" - # Dynamic configurations can be one-time satisfied here before the program takes the config as-is - # e.g. Setting some variable based on some criteria def __init__(self): - """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.BEFORE_RUN, self.before_run), + (RunnerEvents.START_RUN, self.start_run), (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.INTERACT, self.interact), + (RunnerEvents.STOP_MEASUREMENT, self.stop_measurement), + (RunnerEvents.STOP_RUN, self.stop_run), (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.AFTER_EXPERIMENT, self.after_experiment) ]) - self.run_table_model = None # Initialized later + + self.run_table_model = None + self.profiler = None + output.console_log("Custom config loaded") + def validate_experiment(self) -> None: + validate_experiment_requirements(Path(__file__)) + def create_run_table_model(self) -> RunTableModel: - """Create and return the run_table model here. A run_table is a List (rows) of tuples (columns), - representing each run performed""" + factor1 = FactorModel("fib_type", ['iter', 'mem', 'rec']) factor2 = FactorModel("problem_size", [10, 35, 40, 5000, 10000]) + self.run_table_model = RunTableModel( factors=[factor1, factor2], exclude_combinations=[ - {factor2: [10]}, # all runs having treatment "10" will be excluded + {factor2: [10]}, {factor1: ['rec'], factor2: [5000, 10000]}, - {factor1: ['mem', 'iter'], factor2: [35, 40]}, # all runs having the combination ("iter", 30) will be excluded + {factor1: ['mem', 'iter'], factor2: [35, 40]}, ], - repetitions = 10, - data_columns=["energy", "runtime", "memory"] + repetitions=10, + + # IMPORTANT: + data_columns=[ + "cpu_energy", + "core0_energy", + "core1_energy", + "core2_energy", + "core3_energy", + "core4_energy", + "core5_energy", + "core6_energy", + "core7_energy" + ] ) + return self.run_table_model def before_experiment(self) -> None: - """Perform any activity required before starting the experiment here - Invoked only once during the lifetime of the program.""" pass def before_run(self) -> None: - """Perform any activity required before starting a run. - No context is available here as the run is not yet active (BEFORE RUN)""" pass def start_run(self, context: RunnerContext) -> None: - """Perform any activity required for starting the run here. - For example, starting the target system to measure. - Activities after starting the run should also be performed here.""" pass def start_measurement(self, context: RunnerContext) -> None: - """Perform any activity required for starting measurements.""" + fib_type = context.execute_run["fib_type"] problem_size = context.execute_run["problem_size"] - profiler_cmd = f'sudo energibridge \ - --max-execution 20 \ - --output {context.run_dir / "energibridge.csv"} \ - --summary \ - python examples/hello-world-fibonacci/fibonacci_{fib_type}.py {problem_size}' + output_csv = context.run_dir / "energibridge.csv" + + profiler_cmd = ( + f'{self.ENERGIBRIDGE_PATH} ' + f'--max-execution 20 ' + f'--output {output_csv} ' + f'--summary ' + f'{sys.executable} ' + f'examples/hello-world-fibonacci/fibonacci_{fib_type}.py ' + f'{problem_size}' + ) + + output.console_log(f"Running: {profiler_cmd}") + + energibridge_log = open( + context.run_dir / "energibridge.log", + "w" + ) - energibridge_log = open(f'{context.run_dir}/energibridge.log', 'w') - self.profiler = subprocess.Popen(shlex.split(profiler_cmd), stdout=energibridge_log) + self.profiler = subprocess.Popen( + shlex.split(profiler_cmd), + stdout=energibridge_log, + stderr=energibridge_log, + cwd=str(self.ROOT_DIR.parent.parent) + ) def interact(self, context: RunnerContext) -> None: - """Perform any interaction with the running target system here, or block here until the target finishes.""" - # No interaction. We just run it for XX seconds. - # Another example would be to wait for the target to finish, e.g. via `self.target.wait()` output.console_log("Running program for 20 seconds") + time.sleep(20) def stop_measurement(self, context: RunnerContext) -> None: - """Perform any activity here required for stopping measurements.""" - self.profiler.wait() + + if self.profiler: + self.profiler.wait() def stop_run(self, context: RunnerContext) -> None: - """Perform any activity here required for stopping the run. - Activities after stopping the run should also be performed here.""" pass - - def populate_run_data(self, context: RunnerContext) -> Optional[Dict[str, Any]]: - """Parse and process any measurement data here. - You can also store the raw measurement data under `context.run_dir` - Returns a dictionary with keys `self.run_table_model.data_columns` and their values populated""" - # energibridge.csv - Power consumption of the whole system - df = pd.read_csv(context.run_dir / f"energibridge.csv") + def populate_run_data( + self, + context: RunnerContext + ) -> Optional[Dict[str, Any]]: + + csv_path = context.run_dir / "energibridge.csv" + + if not csv_path.exists(): + output.console_log(f"CSV missing: {csv_path}") + return None + + if csv_path.stat().st_size == 0: + output.console_log("CSV empty") + return None + + try: + df = pd.read_csv(csv_path) + + except Exception as e: + output.console_log(f"CSV read error: {e}") + return None + + required_columns = [ + "CPU_ENERGY (J)", + "CORE0_ENERGY (J)", + "CORE1_ENERGY (J)", + "CORE2_ENERGY (J)", + "CORE3_ENERGY (J)", + "CORE4_ENERGY (J)", + "CORE5_ENERGY (J)", + "CORE6_ENERGY (J)", + "CORE7_ENERGY (J)" + ] + + for col in required_columns: + if col not in df.columns: + output.console_log(f"Missing column: {col}") + return None + run_data = { - 'dram_energy': round(df['DRAM_ENERGY (J)'].iloc[-1] - df['DRAM_ENERGY (J)'].iloc[0], 3), - 'package_energy': round(df['PACKAGE_ENERGY (J)'].iloc[-1] - df['PACKAGE_ENERGY (J)'].iloc[0], 3), - 'pp0_energy': round(df['PP0_ENERGY (J)'].iloc[-1] - df['PP0_ENERGY (J)'].iloc[0], 3), - 'pp1_energy': round(df['PP1_ENERGY (J)'].iloc[-1] - df['PP1_ENERGY (J)'].iloc[0], 3), + "cpu_energy": round( + df["CPU_ENERGY (J)"].iloc[-1] + - df["CPU_ENERGY (J)"].iloc[0], + 3 + ), + + "core0_energy": round( + df["CORE0_ENERGY (J)"].iloc[-1] + - df["CORE0_ENERGY (J)"].iloc[0], + 3 + ), + + "core1_energy": round( + df["CORE1_ENERGY (J)"].iloc[-1] + - df["CORE1_ENERGY (J)"].iloc[0], + 3 + ), + + "core2_energy": round( + df["CORE2_ENERGY (J)"].iloc[-1] + - df["CORE2_ENERGY (J)"].iloc[0], + 3 + ), + + "core3_energy": round( + df["CORE3_ENERGY (J)"].iloc[-1] + - df["CORE3_ENERGY (J)"].iloc[0], + 3 + ), + + "core4_energy": round( + df["CORE4_ENERGY (J)"].iloc[-1] + - df["CORE4_ENERGY (J)"].iloc[0], + 3 + ), + + "core5_energy": round( + df["CORE5_ENERGY (J)"].iloc[-1] + - df["CORE5_ENERGY (J)"].iloc[0], + 3 + ), + + "core6_energy": round( + df["CORE6_ENERGY (J)"].iloc[-1] + - df["CORE6_ENERGY (J)"].iloc[0], + 3 + ), + + "core7_energy": round( + df["CORE7_ENERGY (J)"].iloc[-1] + - df["CORE7_ENERGY (J)"].iloc[0], + 3 + ) } + + output.console_log(f"Run data: {run_data}") + return run_data + def after_experiment(self) -> None: - """Perform any activity required after stopping the experiment here - Invoked only once during the lifetime of the program.""" pass - # ================================ DO NOT ALTER BELOW THIS LINE ================================ - experiment_path: Path = None + experiment_path: Path = None \ No newline at end of file diff --git a/examples/hello-world/README.md b/examples/hello-world/README.md index a5ff6e013..406f630db 100644 --- a/examples/hello-world/README.md +++ b/examples/hello-world/README.md @@ -14,3 +14,4 @@ python experiment-runner/ examples/hello-world/RunnerConfig.py ## Results The results are generated in the `examples/hello-world/experiments` folder. +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/hello-world/experiments` folder. \ No newline at end of file diff --git a/examples/hello-world/RunnerConfig.py b/examples/hello-world/RunnerConfig.py index 61d1411f0..6b9f0ffaa 100644 --- a/examples/hello-world/RunnerConfig.py +++ b/examples/hello-world/RunnerConfig.py @@ -5,10 +5,12 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from typing import Dict, List, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os class RunnerConfig: @@ -21,7 +23,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -30,21 +33,31 @@ class RunnerConfig: This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + + """List of data column names that contain energy measurements (e.g., ['energy', 'joules', 'watts']).""" + energy_validation_columns = [ + "avg_cpu", + "avg_mem" + ] + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is # e.g. Setting some variable based on some criteria def __init__(self): """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None # Initialized later @@ -65,6 +78,11 @@ def create_run_table_model(self) -> RunTableModel: data_columns=['avg_cpu', 'avg_mem'] ) return self.run_table_model + + def validate_experiment(self) -> None: + """Perform any experiment validation here. If any validation fails, raise an exception with details on the failure.""" + validate_experiment_requirements(Path(__file__)) + output.console_log("Config.validate_experiment() called!") def before_experiment(self) -> None: """Perform any activity required before starting the experiment here @@ -120,4 +138,4 @@ def after_experiment(self) -> None: output.console_log("Config.after_experiment() called!") # ================================ DO NOT ALTER BELOW THIS LINE ================================ - experiment_path: Path = None + experiment_path: Path = None \ No newline at end of file diff --git a/examples/profilers/ADB/README.md b/examples/profilers/ADB/README.md new file mode 100644 index 000000000..5c445df6c --- /dev/null +++ b/examples/profilers/ADB/README.md @@ -0,0 +1,28 @@ +# `Android Debug Bridge` Profiler + +This example shows how to automatically collect battery and energy metrics from +Android devices during experiment execution using ADB. + +## Requirements + - Android SDK Platform Tools installed + - Linux: + ```bash + sudo apt install android-tools-adb android-tools-fastboot + ``` + - macOS: + ```bash + brew install android-platform-tools + ``` + - Android device connected via USB or emulator running + - USB debugging enabled on device + +## Running +From the root directory of the repo, run the following command: + ```bash + python experiment-runner/ examples/profilers/ADB/RunnerConfig.py + ``` + +## Results +The results are generated in the `examples/profilers/ADB/experiments` folder. + +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/profilers/ADB/experiments` folder. \ No newline at end of file diff --git a/examples/profilers/ADB/RunnerConfig.py b/examples/profilers/ADB/RunnerConfig.py new file mode 100644 index 000000000..383fec2a7 --- /dev/null +++ b/examples/profilers/ADB/RunnerConfig.py @@ -0,0 +1,167 @@ +from EventManager.Models.RunnerEvents import RunnerEvents +from EventManager.EventSubscriptionController import EventSubscriptionController +from ConfigValidator.Config.Models.RunTableModel import RunTableModel +from ConfigValidator.Config.Models.FactorModel import FactorModel +from ConfigValidator.Config.Models.RunnerContext import RunnerContext +from ConfigValidator.Config.Models.OperationType import OperationType +from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) +from Plugins.Profilers.AndroidDebugBridge import AndroidBatteryMonitor + +from typing import Dict, List, Any, Optional +from pathlib import Path +from os.path import dirname, realpath +import time + +class RunnerConfig: + ROOT_DIR = Path(dirname(realpath(__file__))) + + # ================================ USER SPECIFIC CONFIG ================================ + """The name of the experiment.""" + name: str = "android_energy_monitoring_experiment" + + """The path in which Experiment Runner will create a folder with the name `self.name`""" + results_output_path: Path = ROOT_DIR / 'experiments' + + """Experiment operation type""" + operation_type: OperationType = OperationType.AUTO + + """Time between runs (cooldown period)""" + time_between_runs_in_ms: int = 3000 + + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is + # e.g. Setting some variable based on some criteria + def __init__(self): + """Executes immediately after program start, on config load""" + self.profiler = None + + EventSubscriptionController.subscribe_to_multiple_events([ + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + ]) + self.run_table_model = None + + output.console_log("Android Energy Monitoring config loaded") + + def create_run_table_model(self) -> RunTableModel: + """Define the experimental design with factors and data columns. + """ + # Define experimental factors + workload_factor = FactorModel("workload", ['light', 'medium', 'heavy']) + screen_factor = FactorModel("screen_brightness", ['low', 'high']) + + self.run_table_model = RunTableModel( + factors=[workload_factor, screen_factor], + repetitions=1, + data_columns=['workload_duration_ms', 'task_completion_status'] + ) + return self.run_table_model + + def validate_experiment(self) -> None: + """Perform any experiment validation here. If any validation fails, raise an exception with details on the failure.""" + validate_experiment_requirements(Path(__file__)) + output.console_log("Config.validate_experiment() called!") + + def before_experiment(self): + self.profiler = AndroidBatteryMonitor( + device_serial=None, + poll_interval=2 + ) + self.profiler.open_device() + output.console_log("Android profiler initialized") + + def before_run(self) -> None: + """Called before each run.""" + output.console_log(f"Preparing device for run...") + + def start_run(self, context): + if self.profiler is None: + self.profiler = AndroidBatteryMonitor( + device_serial=None, + poll_interval=2 + ) + self.profiler.open_device() + + self.profiler.logfile = (context.run_dir / "android_battery.csv") + + def start_measurement(self, context: RunnerContext) -> None: + """Start measurement.""" + output.console_log("Energy monitoring started") + self.profiler.start() + + def interact(self, context: RunnerContext): + workload = context.execute_run['workload'] + brightness = context.execute_run['screen_brightness'] + + duration_ms = { + 'light': 5000, + 'medium': 10000, + 'heavy': 15000 + }[workload] + + output.console_log( + f"Running {workload} workload " + f"for {duration_ms}ms " + f"(brightness: {brightness})" + ) + + time.sleep(duration_ms / 1000) + + output.console_log("Workload completed") + + def stop_measurement(self, context: RunnerContext) -> None: + """Stop measurement - energy monitoring ends here automatically.""" + output.console_log("Energy monitoring stopped") + self.profiler.stop() + + def stop_run(self, context: RunnerContext) -> None: + """Stop the current run.""" + output.console_log(f"Stopped run: {context.execute_run['__run_id']}") + + def populate_run_data(self, context: RunnerContext): + battery_log = self.profiler.parse_log(self.profiler.logfile) + workload = context.execute_run['workload'] + duration_ms = { + 'light':5000, + 'medium':10000, + 'heavy':15000 + }[workload] + + return { + "workload_duration_ms": duration_ms, + "task_completion_status": "success", + "android_battery__battery_percentage": + battery_log.get("android_battery__percentage", 0), + "android_battery__battery_temperature": + battery_log.get("android_battery__temperature", 0), + "android_battery__battery_voltage": + battery_log.get( + "android_battery__voltage",0), + "android_battery__current_now": + battery_log.get( + "android_battery__current_now",0), + "android_battery__charge_counter": + battery_log.get( + "android_battery__charge_counter",0), + "android_battery__power_draw": + battery_log.get("android_battery__power_draw",0) + } + + def after_experiment(self) -> None: + """Called after experiment completes.""" + output.console_log("Android energy monitoring experiment completed!") + output.console_log(f"Results stored in {self.results_output_path}") + + # ================================ DO NOT ALTER BELOW THIS LINE ================================ + experiment_path: Path = None \ No newline at end of file diff --git a/examples/profilers/EnergiBridge/README.md b/examples/profilers/EnergiBridge/README.md index 57eca022e..1b4db1a75 100644 --- a/examples/profilers/EnergiBridge/README.md +++ b/examples/profilers/EnergiBridge/README.md @@ -20,7 +20,7 @@ python3 experiment-runner/ examples/profilers/EnergiBridge/RunnerConfig.py ## Results The results are generated in the `examples/profilers/EnergiBridge/experiments` folder. - +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/profilers/EnergiBridge/experiments` folder. **!!! WARNING !!!**: COLUMNS IN THE `energibridge.csv` FILES CAN BE DIFFERENT ACROSS MACHINES. ADJUST THE DATAFRAME COLUMN NAMES ACCORDINGLY. diff --git a/examples/profilers/EnergiBridge/RunnerConfig.py b/examples/profilers/EnergiBridge/RunnerConfig.py index 28df866b0..461996a39 100644 --- a/examples/profilers/EnergiBridge/RunnerConfig.py +++ b/examples/profilers/EnergiBridge/RunnerConfig.py @@ -5,11 +5,13 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from Plugins.Profilers.EnergiBridge import EnergiBridge from typing import Dict, List, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os class RunnerConfig: @@ -22,7 +24,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -30,6 +33,9 @@ class RunnerConfig: """The time Experiment Runner will wait after a run completes. This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 + + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" # Dynamic configurations can be one-time satisfied here before the program takes the config as-is # e.g. Setting some variable based on some criteria @@ -37,15 +43,16 @@ def __init__(self): """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None # Initialized later @@ -61,6 +68,11 @@ def create_run_table_model(self) -> RunTableModel: ) return self.run_table_model + + def validate_experiment(self) -> None: + """Perform any experiment validation here. If any validation fails, raise an exception with details on the failure.""" + validate_experiment_requirements(Path(__file__)) + output.console_log("Config.validate_experiment() called!") def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/JoularCore/README.md b/examples/profilers/JoularCore/README.md index b92775876..0931b82cc 100644 --- a/examples/profilers/JoularCore/README.md +++ b/examples/profilers/JoularCore/README.md @@ -22,3 +22,4 @@ sudo python3 experiment-runner/ examples/joularcore-profiling/RunnerConfig.py The results are generated in the `examples/joularcore-profiling/experiments` folder. +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/joularcore-profiling/experiments` folder diff --git a/examples/profilers/JoularCore/RunnerConfig.py b/examples/profilers/JoularCore/RunnerConfig.py index 6e1975a8b..adb0b13ef 100644 --- a/examples/profilers/JoularCore/RunnerConfig.py +++ b/examples/profilers/JoularCore/RunnerConfig.py @@ -5,6 +5,7 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) import shlex from typing import Dict, Any, Optional @@ -22,21 +23,30 @@ class RunnerConfig: ROOT_DIR = Path(dirname(realpath(__file__))) name: str = "joularcore_example" - results_output_path: Path = ROOT_DIR / "experiments" + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) operation_type: OperationType = OperationType.AUTO time_between_runs_in_ms: int = 1000 + + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is + # e.g. Setting some variable based on some criteria def __init__(self): + """Executes immediately after program start, on config load""" + EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run), - (RunnerEvents.START_RUN , self.start_run), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement), - (RunnerEvents.STOP_RUN , self.stop_run), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None @@ -56,6 +66,8 @@ def create_run_table_model(self) -> RunTableModel: data_columns=["avg_process_power", "avg_cpu_usage", "avg_cpu_power"] ) return self.run_table_model + def validate_experiment(self) -> None: + output.console_log("Config.validate_experiment() called!") def before_experiment(self) -> None: output.console_log("Config.before_experiment() called!") diff --git a/examples/profilers/NvidiaML/README.md b/examples/profilers/NvidiaML/README.md index b7acab138..c6854aa05 100644 --- a/examples/profilers/NvidiaML/README.md +++ b/examples/profilers/NvidiaML/README.md @@ -30,3 +30,5 @@ python experiment-runner/ examples/nvml-profiling/RunnerConfig.py ## Results The results are generated in the `examples/nvml-profiling/experiments` folder, in json format. + +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/nvml-profiling/experiments` folder. diff --git a/examples/profilers/NvidiaML/RunnerConfig.py b/examples/profilers/NvidiaML/RunnerConfig.py index fd6242dba..f11f996f8 100644 --- a/examples/profilers/NvidiaML/RunnerConfig.py +++ b/examples/profilers/NvidiaML/RunnerConfig.py @@ -5,10 +5,12 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from Plugins.Profilers.NvidiaML import NvidiaML, NVML_Sample, NVML_Field, NVML_GPU_Operation_Mode, NVML_IDs, NVML_Dynamic_Query from typing import Dict, List, Any, Optional from pathlib import Path +import os import numpy as np import time from os.path import dirname, realpath @@ -24,7 +26,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -33,21 +36,22 @@ class RunnerConfig: This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 - # Dynamic configurations can be one-time satisfied here before the program takes the config as-is - # e.g. Setting some variable based on some criteria + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + def __init__(self): - """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.BEFORE_RUN, self.before_run), + (RunnerEvents.START_RUN, self.start_run), (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.INTERACT, self.interact), + (RunnerEvents.STOP_MEASUREMENT, self.stop_measurement), + (RunnerEvents.STOP_RUN, self.stop_run), (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.AFTER_EXPERIMENT, self.after_experiment) ]) self.run_table_model = None # Initialized later @@ -63,6 +67,9 @@ def create_run_table_model(self) -> RunTableModel: data_columns=["avg_enc", "avg_dec", "avg_pstate"]) return self.run_table_model + + def validate_experiment(self) -> None: + validate_experiment_requirements(Path(__file__)) def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/PicoCM3/README.md b/examples/profilers/PicoCM3/README.md index f9bfec0b2..b95c8cd83 100644 --- a/examples/profilers/PicoCM3/README.md +++ b/examples/profilers/PicoCM3/README.md @@ -36,3 +36,5 @@ python experiment-runner/ examples/picocm3-profiling/RunnerConfig.py The results are generated in the `examples/picocm3-profiling/experiments` folder. There should be a unique log file for each variation in the experiment, as well as a run_table.csv file summarizing these log files. + +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/picocm3-profiling/experiments` folder. \ No newline at end of file diff --git a/examples/profilers/PicoCM3/RunnerConfig.py b/examples/profilers/PicoCM3/RunnerConfig.py index 5772b5cc3..79ec44dc5 100644 --- a/examples/profilers/PicoCM3/RunnerConfig.py +++ b/examples/profilers/PicoCM3/RunnerConfig.py @@ -5,10 +5,12 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from typing import Dict, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os import subprocess import shlex @@ -26,7 +28,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -50,6 +53,28 @@ def __init__(self): (RunnerEvents.STOP_RUN , self.stop_run ), (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + ])"""Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + + """List of data column names that contain energy measurements (e.g., ['energy', 'joules', 'watts']).""" + energy_validation_columns: List[str] = [] + + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is + # e.g. Setting some variable based on some criteria + def __init__(self): + """Executes immediately after program start, on config load""" + + EventSubscriptionController.subscribe_to_multiple_events([ + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.latest_log = None @@ -67,6 +92,9 @@ def create_run_table_model(self) -> RunTableModel: data_columns=['timestamp', 'channel_1(avg)', 'channel_2(off)', 'channel_3(off)']) # Channel 1 is in Amps return self.run_table_model + + def validate_experiment(self) -> None: + validate_experiment_requirements(Path(__file__)) def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/PowerJoular/README.md b/examples/profilers/PowerJoular/README.md index aa1c17961..dfbe75bf3 100644 --- a/examples/profilers/PowerJoular/README.md +++ b/examples/profilers/PowerJoular/README.md @@ -31,3 +31,4 @@ python experiment-runner/ examples/PowerJoular/RunnerConfig.py The results are generated in the `examples/linux-powerjoular-profiling/experiments` folder. +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/linux-powerjoular-profiling/experiments` folder. \ No newline at end of file diff --git a/examples/profilers/PowerJoular/RunnerConfig.py b/examples/profilers/PowerJoular/RunnerConfig.py index 93219d57b..cce80c56d 100644 --- a/examples/profilers/PowerJoular/RunnerConfig.py +++ b/examples/profilers/PowerJoular/RunnerConfig.py @@ -5,12 +5,14 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from Plugins.Profilers.PowerJoular import PowerJoular from typing import Dict, List, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os import time import subprocess @@ -26,7 +28,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -35,21 +38,25 @@ class RunnerConfig: This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is # e.g. Setting some variable based on some criteria def __init__(self): """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None # Initialized later output.console_log("Custom config loaded") @@ -63,6 +70,9 @@ def create_run_table_model(self) -> RunTableModel: data_columns=['avg_cpu', 'total_energy'] ) return self.run_table_model + + def validate_experiment(self) -> None: + validate_experiment_requirements(Path(__file__)) def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/PowerLetrics/README.md b/examples/profilers/PowerLetrics/README.md index 0b357870b..8cef83c13 100644 --- a/examples/profilers/PowerLetrics/README.md +++ b/examples/profilers/PowerLetrics/README.md @@ -19,3 +19,5 @@ python experiment-runner/ examples/powerletrics-profiling/RunnerConfig.py ## Results The results are generated in the `examples/powerletrics-profiling/experiments` folder. + +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/powerletrics-profiling/experiments` folder. \ No newline at end of file diff --git a/examples/profilers/PowerLetrics/RunnerConfig.py b/examples/profilers/PowerLetrics/RunnerConfig.py index 026b33549..27b5257b9 100644 --- a/examples/profilers/PowerLetrics/RunnerConfig.py +++ b/examples/profilers/PowerLetrics/RunnerConfig.py @@ -5,6 +5,7 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from Plugins.Profilers.PowerLetrics import PowerLetrics from typing import Dict, List, Any, Optional @@ -12,6 +13,7 @@ import numpy as np from pathlib import Path from os.path import dirname, realpath +import os class RunnerConfig: @@ -24,7 +26,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -33,21 +36,25 @@ class RunnerConfig: This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is # e.g. Setting some variable based on some criteria def __init__(self): """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None # Initialized later @@ -63,6 +70,9 @@ def create_run_table_model(self) -> RunTableModel: data_columns=["energy_footprint", "cpu_utilization", "process_name"]) return self.run_table_model + + def validate_experiment(self) -> None: + validate_experiment_requirements(realpath(__file__)) def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/PowerMetrics/README.md b/examples/profilers/PowerMetrics/README.md index c63c0ab5f..df62e5143 100644 --- a/examples/profilers/PowerMetrics/README.md +++ b/examples/profilers/PowerMetrics/README.md @@ -23,3 +23,5 @@ sudo python experiment-runner/ examples/powermetrics-profiling/RunnerConfig.py ## Results The results are generated in the `examples/powermetrics-profiling/experiments` folder. + +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/powermetrics-profiling/experiments` folder. \ No newline at end of file diff --git a/examples/profilers/PowerMetrics/RunnerConfig.py b/examples/profilers/PowerMetrics/RunnerConfig.py index 417aeccbd..f7256fa99 100644 --- a/examples/profilers/PowerMetrics/RunnerConfig.py +++ b/examples/profilers/PowerMetrics/RunnerConfig.py @@ -5,11 +5,13 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from Plugins.Profilers.PowerMetrics import PowerMetrics from typing import Dict, List, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os import time import numpy as np @@ -23,7 +25,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -32,21 +35,25 @@ class RunnerConfig: This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" + # Dynamic configurations can be one-time satisfied here before the program takes the config as-is # e.g. Setting some variable based on some criteria def __init__(self): """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None # Initialized later output.console_log("Custom config loaded") @@ -62,6 +69,10 @@ def create_run_table_model(self) -> RunTableModel: data_columns=["joules", "avg_cpu", "avg_gpu"]) return self.run_table_model + + def validate_experiment(self) -> None: + """Perform any experiment validation here. If any validation fails, raise an exception with details on the failure.""" + validate_experiment_requirements(Path(__file__)) def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/linux-ps-profiling/README.md b/examples/profilers/linux-ps-profiling/README.md index 17c30877a..3d593f2dc 100644 --- a/examples/profilers/linux-ps-profiling/README.md +++ b/examples/profilers/linux-ps-profiling/README.md @@ -26,3 +26,4 @@ python experiment-runner/ examples/linux-ps-profiling/RunnerConfig.py The results are generated in the `examples/linux-ps-profiling/experiments` folder. +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/linux-ps-profiling/experiments` folder. diff --git a/examples/profilers/linux-ps-profiling/RunnerConfig.py b/examples/profilers/linux-ps-profiling/RunnerConfig.py index 96e4d8be8..f7e291466 100644 --- a/examples/profilers/linux-ps-profiling/RunnerConfig.py +++ b/examples/profilers/linux-ps-profiling/RunnerConfig.py @@ -5,11 +5,13 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from Plugins.Profilers.Ps import Ps from typing import Dict, List, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os import numpy as np import time @@ -27,7 +29,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -35,6 +38,9 @@ class RunnerConfig: """The time Experiment Runner will wait after a run completes. This can be essential to accommodate for cooldown periods on some systems.""" time_between_runs_in_ms: int = 1000 + + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" # Dynamic configurations can be one-time satisfied here before the program takes the config as-is # e.g. Setting some variable based on some criteria @@ -42,15 +48,16 @@ def __init__(self): """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ - (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), - (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), - (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), + (RunnerEvents.BEFORE_EXPERIMENT , self.before_experiment), + (RunnerEvents.BEFORE_RUN , self.before_run ), + (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.START_MEASUREMENT , self.start_measurement), + (RunnerEvents.INTERACT , self.interact ), + (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), + (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.POPULATE_RUN_DATA , self.populate_run_data), + (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) ]) self.run_table_model = None # Initialized later output.console_log("Custom config loaded") @@ -68,6 +75,9 @@ def create_run_table_model(self) -> RunTableModel: data_columns=["avg_cpu", "avg_mem"] ) return self.run_table_model + + def validate_experiment(self) -> None: + validate_experiment_requirements(Path(__file__)) def before_experiment(self) -> None: """Perform any activity required before starting the experiment here diff --git a/examples/profilers/linux-ps-profiling/primer b/examples/profilers/linux-ps-profiling/primer new file mode 100755 index 000000000..901841a7c Binary files /dev/null and b/examples/profilers/linux-ps-profiling/primer differ diff --git a/examples/profilers/measure-self-profiling/README.md b/examples/profilers/measure-self-profiling/README.md index 34c1cf2d7..3c851ae0e 100644 --- a/examples/profilers/measure-self-profiling/README.md +++ b/examples/profilers/measure-self-profiling/README.md @@ -31,6 +31,8 @@ python experiment-runner/ examples/measure-self-profiling/RunnerConfig.py The results are generated in the `examples/measure-self-profiling/experiments` folder, and are added to your run table model. A log file can be specified to additionally save the full energibridge logs to a separate file. +In case there are anomalies such as null, absent, or negative values, a report will be generated in the `examples/measure-self-profiling/experiments` folder. + **!!! WARNING !!!**: COLUMNS IN THE `energibridge.log` FILES CAN BE DIFFERENT ACROSS MACHINES. ADJUST YOUR ANALYSIS OF THE RESULTS ACCORDINGLY. diff --git a/examples/profilers/measure-self-profiling/RunnerConfig.py b/examples/profilers/measure-self-profiling/RunnerConfig.py index a15754836..7cb6c9837 100644 --- a/examples/profilers/measure-self-profiling/RunnerConfig.py +++ b/examples/profilers/measure-self-profiling/RunnerConfig.py @@ -4,10 +4,12 @@ from ConfigValidator.Config.Models.RunnerContext import RunnerContext from ConfigValidator.Config.Models.OperationType import OperationType from ProgressManager.Output.OutputProcedure import OutputProcedure as output +from ProgressManager.Validation.RequirementsValidator import (validate_experiment_requirements) from typing import Optional, Dict, Any from pathlib import Path from os.path import dirname, realpath +import os import time @@ -21,7 +23,8 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO @@ -44,7 +47,8 @@ class RunnerConfig: This parameter is optional and defaults to /usr/local/bin/energibridge """ - self_measure_bin: Path = "/usr/local/bin/energibridge" + default_energibridge = "/usr/local/bin/energibridge" + self_measure_bin: Path = Path(os.getenv("ENERGIBRIDGE_PATH", default_energibridge)) """ Where to save the full log files for energibridge. If specified, log files are saved to context.run_dir/. @@ -54,21 +58,22 @@ class RunnerConfig: """ self_measure_logfile: Path = "energibridge.log" - # Dynamic configurations can be one-time satisfied here before the program takes the config as-is - # e.g. Setting some variable based on some criteria + """Path to log file for energy validation report. Relative to experiment output directory.""" + energy_validation_log_file: str = "energy_validation_report.log" +\ def __init__(self): - """Executes immediately after program start, on config load""" EventSubscriptionController.subscribe_to_multiple_events([ + (RunnerEvents.VALIDATE_EXPERIMENT, self.validate_experiment), (RunnerEvents.BEFORE_EXPERIMENT, self.before_experiment), - (RunnerEvents.BEFORE_RUN , self.before_run ), - (RunnerEvents.START_RUN , self.start_run ), + (RunnerEvents.BEFORE_RUN, self.before_run), + (RunnerEvents.START_RUN, self.start_run), (RunnerEvents.START_MEASUREMENT, self.start_measurement), - (RunnerEvents.INTERACT , self.interact ), - (RunnerEvents.STOP_MEASUREMENT , self.stop_measurement ), - (RunnerEvents.STOP_RUN , self.stop_run ), + (RunnerEvents.INTERACT, self.interact), + (RunnerEvents.STOP_MEASUREMENT, self.stop_measurement), + (RunnerEvents.STOP_RUN, self.stop_run), (RunnerEvents.POPULATE_RUN_DATA, self.populate_run_data), - (RunnerEvents.AFTER_EXPERIMENT , self.after_experiment ) + (RunnerEvents.AFTER_EXPERIMENT, self.after_experiment) ]) self.run_table_model = None # Initialized later output.console_log("Custom config loaded") @@ -82,6 +87,9 @@ def create_run_table_model(self) -> RunTableModel: ) return self.run_table_model + def validate_experiment(self) -> None: + validate_experiment_requirements(Path(__file__)) + def before_experiment(self) -> None: """Perform any activity required before starting the experiment here Invoked only once during the lifetime of the program.""" diff --git a/experiment-runner/ConfigValidator/CLIRegister/CLIRegister.py b/experiment-runner/ConfigValidator/CLIRegister/CLIRegister.py index 3b7026f6f..c995efd70 100644 --- a/experiment-runner/ConfigValidator/CLIRegister/CLIRegister.py +++ b/experiment-runner/ConfigValidator/CLIRegister/CLIRegister.py @@ -31,7 +31,8 @@ def execute(args=None) -> None: if args is None: filepath = __file__.split('/') filepath.pop() - filepath = '/'.join(filepath) + "/../../../examples/" + #filepath = '/'.join(filepath) + "/../../../examples/" + filepath = os.getenv("EXAMPLES_PATH", '/'.join(filepath) + "/../../../examples/") destination = os.path.abspath(filepath) else: if len(args) == 3: diff --git a/experiment-runner/ConfigValidator/Config/RunnerConfig.py b/experiment-runner/ConfigValidator/Config/RunnerConfig.py index a79e0eb4e..4fbef26a9 100644 --- a/experiment-runner/ConfigValidator/Config/RunnerConfig.py +++ b/experiment-runner/ConfigValidator/Config/RunnerConfig.py @@ -1,3 +1,5 @@ +import os + from EventManager.Models.RunnerEvents import RunnerEvents from EventManager.EventSubscriptionController import EventSubscriptionController from ConfigValidator.Config.Models.RunTableModel import RunTableModel @@ -22,7 +24,9 @@ class RunnerConfig: """The path in which Experiment Runner will create a folder with the name `self.name`, in order to store the results from this experiment. (Path does not need to exist - it will be created if necessary.) Output path defaults to the config file's path, inside the folder 'experiments'""" - results_output_path: Path = ROOT_DIR / 'experiments' + #results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) """Experiment operation type. Unless you manually want to initiate each run, use `OperationType.AUTO`.""" operation_type: OperationType = OperationType.AUTO diff --git a/experiment-runner/ConfigValidator/Config/Validation/ConfigValidator.py b/experiment-runner/ConfigValidator/Config/Validation/ConfigValidator.py index dcd9ff205..81d16bde6 100644 --- a/experiment-runner/ConfigValidator/Config/Validation/ConfigValidator.py +++ b/experiment-runner/ConfigValidator/Config/Validation/ConfigValidator.py @@ -81,8 +81,9 @@ def validate_config(config: RunnerConfig): if config.self_measure: if not hasattr(config, "self_measure_bin"): - config.self_measure_bin = "/usr/local/bin/energibridge" # This is spesific to linux, might work for osx as well - + #config.self_measure_bin = "/usr/local/bin/energibridge" # This is spesific to linux, might work for osx as well + config.self_measure_bin = os.getenv("ENERGIBRIDGE_PATH", "/usr/local/bin/energibridge") + if not hasattr(config, "self_measure_logfile"): config.self_measure_logfile = None diff --git a/experiment-runner/DistributedExecution/DistributedOrchestrator.py b/experiment-runner/DistributedExecution/DistributedOrchestrator.py new file mode 100644 index 000000000..270cd7616 --- /dev/null +++ b/experiment-runner/DistributedExecution/DistributedOrchestrator.py @@ -0,0 +1,378 @@ +from ProgressManager.RunTable.Models.RunProgress import RunProgress +from ConfigValidator.Config.Models.Metadata import Metadata +from ProgressManager.Output.CSVOutputManager import CSVOutputManager +from ConfigValidator.Config.Models.OperationType import OperationType +from EventManager.Models.RunnerEvents import RunnerEvents +from EventManager.EventSubscriptionController import EventSubscriptionController +from ProgressManager.Validation.AnomaliesChecker import ResultsValidator, AnomalyReport + +from flask import Flask, request, jsonify +import threading +import time +from pathlib import Path +import pandas as pd +import os +from waitress import serve + +### ========================================================= +### | | +### | TaskManager | +### | - Assign available runs to connected workers | +### | - Update and persist run_table.csv state | +### | - Trigger AFTER_EXPERIMENT lifecycle event | +### | - Detect experiment completion | +### | | +### | *Any state modification to runs should happen | +### | through this class to avoid race conditions | +### | | +### ========================================================= +class TaskManager: + + def __init__(self,config, run_table, experiment_path: Path): + self.config = config + self.run_table = run_table + self.experiment_path = experiment_path + self.assigned_runs = {} + self.total_runs = len(run_table) + self.lock = threading.Lock() + self.csv_manager = CSVOutputManager(experiment_path) + self.completed = False + self.shutdown = False + self.validation_results = {} + + def get_next_task(self, agent_id): + with self.lock: + + # If experiment already completed + if self.completed: + return None + + for idx, run in enumerate(self.run_table): + if run['__done'] == RunProgress.TODO: + run_id = run["__run_id"] + + run_dir = self.experiment_path / str(run_id) + run_dir.mkdir(parents=True, exist_ok=True) + + run['__done'] = RunProgress.RUNNING + run['agent_id'] = agent_id + + run['__current_run'] = idx + 1 + run['__total_runs'] = self.total_runs + run["run_dir"] = str(run_dir) + + self.assigned_runs[run_id] = agent_id + self.csv_manager.write_run_table(self.run_table) + + task = run.copy() + task['__done'] = task['__done'].name + + print(f"[MASTER] Assigned {run_id} -> {agent_id}") + return task + return None + + def complete_task(self, run_id, data): + with self.lock: + for run in self.run_table: + if run["__run_id"] == run_id: + # Merge returned data + if data: + for k, v in data.items(): + run[k] = v + run["__done"] = RunProgress.DONE + + self.assigned_runs.pop(run_id, None) + self.csv_manager.write_run_table(self.run_table) + print(f"[MASTER] Completed run {run_id}") + break + + # Check if all runs are done + all_done = all( + run['__done'] == RunProgress.DONE + for run in self.run_table + ) + if all_done and not self.completed: + self.completed = True + self.shutdown = True + print("\n[MASTER] ALL RUNS COMPLETED\n") + + if self.config.operation_type is OperationType.SEMI: + EventSubscriptionController.raise_event(RunnerEvents.CONTINUE) + + # AFTER_EXPERIMENT hook + print("[MASTER] Calling AFTER_EXPERIMENT hook") + EventSubscriptionController.raise_event( + RunnerEvents.AFTER_EXPERIMENT + ) + + def restore_crashed_runs(self): + """ + If server restarts and finds RUNNING runs, + restore them to TODO. + """ + changed = False + + for run in self.run_table: + if run['__done'] == RunProgress.RUNNING: + run['__done'] = RunProgress.TODO + run['agent_id'] = None + changed = True + if changed: + print("[MASTER] Restored RUNNING -> TODO after restart") + self.csv_manager.write_run_table(self.run_table) + + def experiment_already_completed(self): + return all( + run['__done'] == RunProgress.DONE + for run in self.run_table + ) + +### ========================================================= +### | | +### | APIServer | +### | - Handles the communication between workers | +### | and the orchestrator | +### | - Handle task distribution requests | +### | - Receive completed experiment results | +### | - Handle worker heartbeat updates | +### | - Receive worker heartbeat updates | +### | - Provide experiment | +### | monitoring/status endpoint | +### | - Trigger orchestrator shutdown | +### | | +### | | +### ========================================================= +class APIServer: + + def __init__(self, task_manager, worker_monitor): + self.app = Flask(__name__) + self.task_manager = task_manager + self.monitor = worker_monitor + + @self.app.route('/task', methods=['GET']) + def get_task(): + agent_id = request.args.get('agent_id') + self.monitor.heartbeat(agent_id) + #task = self.task_manager.get_next_task(agent_id) + + if self.task_manager.shutdown: + return jsonify({ + "shutdown": True, + "run": None + }) + + task = self.task_manager.get_next_task(agent_id) + + return jsonify({ + "shutdown": False, + "run": task if task else None + }) + + @self.app.route('/result', methods=['POST']) + def submit_result(): + payload = request.get_json() + run_id = payload.get('run_id') + run_data = payload.get('data', {}) + status = payload.get('status') + anomalies = request.json.get("anomalies", []) + + if status == "FAILED": + print(f"[MASTER] Run failed: {run_id}") + print(payload.get('error')) + + # Return run to TODO + for run in self.task_manager.run_table: + if run['__run_id'] == run_id: + run['__done'] = RunProgress.TODO + run['agent_id'] = None + self.task_manager.csv_manager.write_run_table( + self.task_manager.run_table + ) + else: + self.task_manager.complete_task(run_id, run_data) + if anomalies: + report = AnomalyReport() + report.anomalies.extend(anomalies) + log_file_path = (self.task_manager.experiment_path/ self.task_manager.config.energy_validation_log_file) + ResultsValidator.update_report(report, log_file_path) + return jsonify({"status": "ok"}) + + @self.app.route('/heartbeat', methods=['POST']) + def heartbeat(): + data = request.get_json() + agent_id = data.get('agent_id') + self.monitor.heartbeat(agent_id) + + return jsonify({"status": "ok"}) + + @self.app.route('/status', methods=['GET']) + def status(): + total_runs = len(self.task_manager.run_table) + todo_count = sum( + 1 for r in self.task_manager.run_table + if r['__done'] == RunProgress.TODO + ) + running_count = sum( + 1 for r in self.task_manager.run_table + if r['__done'] == RunProgress.RUNNING + ) + done_count = sum( + 1 for r in self.task_manager.run_table + if r['__done'] == RunProgress.DONE + ) + return jsonify({ + "status": "ok", + "total_runs": total_runs, + "runs": { + "todo": todo_count, + "running": running_count, + "done": done_count + }, + "active_agents": len(self.monitor.heartbeats) + }) + + @self.app.route('/shutdown', methods=['POST']) + def shutdown(): + shutdown_server() + return jsonify({"status": "shutting down"}) + +### ========================================================= +### | | +### | WorkerMonitor | +### | - Keeps track of connected workers | +### | - If a worker fails to send a heartbeat | +### | within the timeout period, it is considered | +### | dead | +### | - Return the assigment back to TODO | +### | | +### | | +### ========================================================= +class WorkerMonitor: + + def __init__(self, task_manager): + self.heartbeats = {} + self.task_manager = task_manager + self.timeout = 60 + + def heartbeat(self, agent_id): + self.heartbeats[agent_id] = time.time() + + def monitor(self): + while not self.task_manager.completed: + time.sleep(10) + now = time.time() + dead = [ + agent for agent, t in self.heartbeats.items() + if now - t > self.timeout + ] + for agent in dead: + print(f"[MASTER] Worker {agent} dead") + + for run in self.task_manager.run_table: + if ( + run.get("agent_id") == agent + and run["__done"] != RunProgress.DONE + ): + print(f"[MASTER] Returning run " + f"{run['__run_id']} -> TODO") + + run["__done"] = RunProgress.TODO + run["agent_id"] = None + self.task_manager.csv_manager.write_run_table( + self.task_manager.run_table + ) + del self.heartbeats[agent] + +### ========================================================= +### | | +### | DistributedOrchestrator | +### | - Initialize experiment infrastructure | +### | - Load or create run_table.csv | +### | - Restore interrupted experiments | +### | - Start monitoring threads | +### | - Start the API server | +### | - If anomalies are present combined them | +### | into a report | +### | | +### | | +### ========================================================= + +class DistributedOrchestrator: + + def __init__(self, config, metadata, host="0.0.0.0", port=5000): + self.config = config + self.metadata = metadata + self.host = host + self.port = port + + self.experiment_path = (config.results_output_path / config.name) + self.experiment_path.mkdir(parents=True, exist_ok=True) + self.run_table_path = (self.experiment_path / "run_table.csv") + + EventSubscriptionController.raise_event( + RunnerEvents.VALIDATE_EXPERIMENT + ) + if self.run_table_path.exists(): + print("[MASTER] Existing experiment detected") + + csv_manager = CSVOutputManager(self.experiment_path) + run_table = csv_manager.read_run_table() + else: + print("[MASTER] Creating new experiment") + + run_table = (config.create_run_table_model().generate_experiment_run_table()) + pd.DataFrame(run_table).to_csv(self.run_table_path, index=False) + + self.task_manager = TaskManager(self.config, run_table, self.experiment_path) + self.task_manager.restore_crashed_runs() + + if self.task_manager.experiment_already_completed(): + print("[MASTER] Experiment already completed") + + self.finished_before_start = True + else: + self.finished_before_start = False + self.monitor = WorkerMonitor(self.task_manager) + + self.api = APIServer(self.task_manager, self.monitor) + + def start(self): + if self.finished_before_start: + return + + EventSubscriptionController.raise_event( + RunnerEvents.BEFORE_EXPERIMENT + ) + + threading.Thread( + target=self.monitor.monitor, + daemon=True + ).start() + + print(f"[MASTER] Starting server " + f"on {self.host}:{self.port}") + + server_thread = threading.Thread( + target=lambda: serve( + self.api.app, + host=self.host, + port=self.port + ), + daemon=True + ) + + server_thread.start() + + while not self.task_manager.shutdown: + time.sleep(1) + + print("[MASTER] Waiting for workers to shutdown...") + time.sleep(10) + print("[MASTER] Shutting down") + os._exit(0) + +def shutdown_server(): + func = request.environ.get('werkzeug.server.shutdown') + if func is None: + os._exit(0) + func() \ No newline at end of file diff --git a/experiment-runner/DistributedExecution/Worker.py b/experiment-runner/DistributedExecution/Worker.py new file mode 100644 index 000000000..e060a9ecf --- /dev/null +++ b/experiment-runner/DistributedExecution/Worker.py @@ -0,0 +1,184 @@ +from ExperimentOrchestrator.Experiment.Run.RunController import RunController +from EventManager.EventSubscriptionController import EventSubscriptionController +from EventManager.Models.RunnerEvents import RunnerEvents +from ProgressManager.Validation.AnomaliesChecker import ResultsValidator + +import threading +import time +import requests +import numpy as np +from enum import Enum + +### ========================================================= +### | | +### | WorkerRuntime | +### | | +### | - Connect to the master orchestrator | +### | - Request experiment runs/tasks | +### | - Execute runs locally + anomalies check | +### | - Send results back to the master | +### | - Send periodic heartbeat updates | +### | - Gracefully shutdown on master request | +### | | +### ========================================================= +class WorkerRuntime: + @staticmethod + def make_json_safe(obj): + if isinstance(obj, dict): + return {k: WorkerRuntime.make_json_safe(v) for k, v in obj.items()} + if isinstance(obj, list): + return [WorkerRuntime.make_json_safe(v) for v in obj] + if isinstance(obj, np.generic): + return obj.item() + if isinstance(obj, Enum): + return obj.value + return obj + + def __init__(self, master_url, heartbeat_interval=40, idle_timeout=120): + self.master_url = master_url + self.heartbeat_interval = heartbeat_interval + self.idle_timeout = idle_timeout + + self._stop = False + self.current_run = None + self.agent_id = None + self.last_task_time = None + + def run_loop(self, agent_id, config): + self.agent_id = agent_id + self.last_task_time = time.time() + + print(f"[WORKER] Starting with agent_id: {self.agent_id}") + print(f"[WORKER] Master URL: {self.master_url}") + + print("[WORKER] Validating experiment setup") + EventSubscriptionController.raise_event( + RunnerEvents.VALIDATE_EXPERIMENT + ) + + threading.Thread(target=self._heartbeat_loop, daemon=True).start() + + print("[WORKER] Heartbeat thread started") + print(f"[WORKER] Waiting for tasks (idle timeout {self.idle_timeout}s)") + + while not self._stop: + task = self._get_task() + if task == "SHUTDOWN": + print("[WORKER] Master shutdown acknowledged") + break + + if not task: + if time.time() - self.last_task_time > self.idle_timeout: + print("[WORKER] Idle timeout reached - exiting") + break + self.current_run = None + time.sleep(3) + continue + + self.last_task_time = time.time() + self.current_run = task + + run_id = task["__run_id"] + + try: + run_data, anomaly_report = self._execute(task, config) + self._send_result(run_id, run_data, anomaly_report) + except Exception as e: + self._send_failure(run_id, str(e)) + finally: + self.current_run = None + print(f"[WORKER] Worker {self.agent_id} exiting") + + def _get_task(self): + try: + r = requests.get(self.master_url + "/task", params={"agent_id": self.agent_id}, timeout=5) + response = r.json() + if response.get("shutdown"): + print("[WORKER] Received shutdown signal from master") + self._stop = True + return "SHUTDOWN" + task = response.get("run") + + if task: + print(f"[WORKER] Got task: {task.get('__run_id')}") + + return task + + except Exception as e: + print(f"[WORKER] Error getting task: {e}") + return None + + def _execute(self, run, config): + print(f"[WORKER] Executing task {run.get('__run_id')}") + + current_run = run.get('__current_run', 0) + total_runs = run.get('__total_runs', 1) + + controller = RunController(run, config, current_run, total_runs, distributed_mode=True) + run_data = controller.do_run() + run_id = run["__run_id"] + + # Check for anomalies in the run raw result + treatment_levels = { + k: v + for k, v in run.items() + if not k.startswith("__") + } + run_dir = config.experiment_path / run_id + anomaly_report = ResultsValidator.validate_output_log( + run_dir, + run_id, + treatment_levels + ) + + print(f"[WORKER] Task {run.get('__run_id')} completed") + return run_data, anomaly_report + + def _send_result(self, run_id, data, anomaly_report = None): + try: + safe_data = WorkerRuntime.make_json_safe(data) + + payload = {"run_id": run_id, "data": safe_data, "status": "DONE", "anomalies": ( + anomaly_report.anomalies + if anomaly_report and anomaly_report.has_anomalies() + else [] + )} + + response = requests.post(self.master_url + "/result", json=payload, timeout=10) + response.raise_for_status() + + print(f"[WORKER] Result sent for task {run_id}") + + except requests.exceptions.RequestException as e: + print(f"[WORKER] Network error sending result: {e}") + except Exception as e: + print(f"[WORKER] Unexpected error: {e}") + + def _send_failure(self, run_id, error): + try: + requests.post( + self.master_url + "/result", + json={"run_id": run_id, "status": "FAILED", "error": error}, + timeout=10 + ) + print(f"[WORKER] Failure sent for {run_id}") + + except Exception as e: + print(f"[WORKER] Error sending failure: {e}") + + def _heartbeat_loop(self): + while not self._stop: + try: + requests.post( + self.master_url + "/heartbeat", + json={ + "agent_id": self.agent_id, + "status": "RUNNING" if self.current_run else "IDLE", + "run_id": self.current_run["__run_id"] if self.current_run else None, + "timestamp": time.time() + }, + timeout=5 + ) + except Exception as e: + print(f"[WORKER] Heartbeat error: {e}") + time.sleep(self.heartbeat_interval) \ No newline at end of file diff --git a/experiment-runner/DistributedExecution/__init__.py b/experiment-runner/DistributedExecution/__init__.py new file mode 100644 index 000000000..613f7dc93 --- /dev/null +++ b/experiment-runner/DistributedExecution/__init__.py @@ -0,0 +1,15 @@ +""" +Distributed Execution Module + +Simple framework for running experiments across multiple machines. +""" +from .DistributedOrchestrator import DistributedOrchestrator, APIServer, TaskManager, WorkerMonitor +from .Worker import WorkerRuntime + +__all__ = [ + 'WorkerRuntime', + 'APIServer', + 'TaskManager', + 'WorkerMonitor', + 'DistributedOrchestrator', +] diff --git a/experiment-runner/EventManager/EventSubscriptionController.py b/experiment-runner/EventManager/EventSubscriptionController.py index 113fe1457..285ea6215 100644 --- a/experiment-runner/EventManager/EventSubscriptionController.py +++ b/experiment-runner/EventManager/EventSubscriptionController.py @@ -1,5 +1,6 @@ from typing import Callable, List, Tuple from EventManager.Models.RunnerEvents import RunnerEvents +from ConfigValidator.CustomErrors.BaseError import BaseError class EventSubscriptionController: __call_back_register: dict = dict() @@ -20,11 +21,13 @@ def raise_event(event: RunnerEvents, runner_context=None): event_callback = EventSubscriptionController.__call_back_register[event] except KeyError: return None - - if runner_context: - return event_callback(runner_context) - else: - return event_callback() + try: + if runner_context: + return event_callback(runner_context) + else: + return event_callback() + except Exception as e: + raise BaseError(f"Error in event handler for {event.name}: {str(e)}") @staticmethod def get_event_callback(event: RunnerEvents): diff --git a/experiment-runner/EventManager/Models/RunnerEvents.py b/experiment-runner/EventManager/Models/RunnerEvents.py index 9ae200bc5..f6dd36699 100644 --- a/experiment-runner/EventManager/Models/RunnerEvents.py +++ b/experiment-runner/EventManager/Models/RunnerEvents.py @@ -1,13 +1,14 @@ from enum import Enum, auto class RunnerEvents(Enum): - BEFORE_EXPERIMENT = auto() - BEFORE_RUN = auto() - START_RUN = auto() - START_MEASUREMENT = auto() - INTERACT = auto() - CONTINUE = auto() - STOP_MEASUREMENT = auto() - STOP_RUN = auto() - POPULATE_RUN_DATA = auto() - AFTER_EXPERIMENT = auto() + VALIDATE_EXPERIMENT = auto() + BEFORE_EXPERIMENT = auto() + BEFORE_RUN = auto() + START_RUN = auto() + START_MEASUREMENT = auto() + INTERACT = auto() + CONTINUE = auto() + STOP_MEASUREMENT = auto() + STOP_RUN = auto() + POPULATE_RUN_DATA = auto() + AFTER_EXPERIMENT = auto() diff --git a/experiment-runner/ExperimentOrchestrator/Experiment/ExperimentController.py b/experiment-runner/ExperimentOrchestrator/Experiment/ExperimentController.py index a851747e5..124cc9002 100644 --- a/experiment-runner/ExperimentOrchestrator/Experiment/ExperimentController.py +++ b/experiment-runner/ExperimentOrchestrator/Experiment/ExperimentController.py @@ -13,6 +13,11 @@ from ProgressManager.Output.OutputProcedure import OutputProcedure as output from EventManager.EventSubscriptionController import EventSubscriptionController from ConfigValidator.CustomErrors.ProgressErrors import AllRunsCompletedOnRestartError +from ProgressManager.Validation.AnomaliesChecker import ( + ResultsValidator, + AnomalyReport +) +from pathlib import Path ### ========================================================= @@ -34,8 +39,20 @@ def __init__(self, config: RunnerConfig, metadata: Metadata): self.config = config self.metadata = metadata + self.validation_state = 0 + self.validation_log_file_path = (self.config.experiment_path / self.config.energy_validation_log_file) + self.csv_data_manager = CSVOutputManager(self.config.experiment_path) self.json_data_manager = JSONOutputManager(self.config.experiment_path) + # -- Validate experiment setup + # TODO: From the user perspective, it would be nice to know if are any possible issues with the experiment before staring the experiment runs. For example, if the config hooks are not properly defined, or if there are any issues with the config file itself + + output.console_log_WARNING("Calling validate_experiment config hook") + try: + EventSubscriptionController.raise_event(RunnerEvents.VALIDATE_EXPERIMENT) + except BaseError as e: + output.console_log_FAIL(f"Experiment validation failed: {e}") + raise run_tbl = self.config.create_run_table_model() # Add in the proper data column for energibridge @@ -55,6 +72,12 @@ def __init__(self, config: RunnerConfig, metadata: Metadata): output.console_log_WARNING(f"Reusing already existing experiment path: {self.config.experiment_path}") existing_run_table = self.csv_data_manager.read_run_table() + for run in existing_run_table: + if run['__done'] == RunProgress.RUNNING: + run['__done'] = RunProgress.TODO + self.csv_data_manager.write_run_table(existing_run_table) + print("[MASTER] Restored RUNNING -> TODO after restart") + # First sanity check. If there is no "TODO" in the __done column, simply abort. todo_run_found = any([current_run['__done'] != RunProgress.DONE for current_run in existing_run_table]) if not todo_run_found: @@ -139,7 +162,27 @@ def do_experiment(self): ) perform_run.start() perform_run.join() - + + # -- Checks for anomalies in the run raw result + run_id = current_run["__run_id"] + treatment_levels = { + k: v + for k, v in current_run.items() + if not k.startswith("__") + } + + run_dir = self.config.experiment_path / run_id + run_report = ResultsValidator.validate_output_log( + run_dir, + run_id, + treatment_levels, + ) + if run_report.has_anomalies(): + ResultsValidator.update_report( + run_report, + self.validation_log_file_path + ) + time_btwn_runs = self.config.time_between_runs_in_ms if time_btwn_runs > 0: output.console_log_bold(f"Run fully ended, waiting for: {time_btwn_runs}ms == {time_btwn_runs / 1000}s") @@ -152,4 +195,4 @@ def do_experiment(self): # -- After experiment output.console_log_WARNING("Calling after_experiment config hook") - EventSubscriptionController.raise_event(RunnerEvents.AFTER_EXPERIMENT) + EventSubscriptionController.raise_event(RunnerEvents.AFTER_EXPERIMENT) \ No newline at end of file diff --git a/experiment-runner/ExperimentOrchestrator/Experiment/Run/IRunController.py b/experiment-runner/ExperimentOrchestrator/Experiment/Run/IRunController.py index d90cd8de3..144ecbd5e 100644 --- a/experiment-runner/ExperimentOrchestrator/Experiment/Run/IRunController.py +++ b/experiment-runner/ExperimentOrchestrator/Experiment/Run/IRunController.py @@ -16,7 +16,7 @@ class IRunController(ABC): run_context: RunnerContext = None data_manager: CSVOutputManager = None - def __init__(self, variation: Dict, config: RunnerConfig, current_run: int, total_runs: int): + def __init__(self, variation: Dict, config: RunnerConfig, current_run: int, total_runs: int, distributed_mode: bool = False): self.run_dir = config.experiment_path / variation['__run_id'] self.run_dir.mkdir(parents=True, exist_ok=True) @@ -25,6 +25,7 @@ def __init__(self, variation: Dict, config: RunnerConfig, current_run: int, tota self.current_run = current_run self.run_context = RunnerContext(self.variation, self.current_run, self.run_dir) self.data_manager = CSVOutputManager(self.config.experiment_path) + self.distributed_mode = distributed_mode self.run_completed_event = Event() diff --git a/experiment-runner/ExperimentOrchestrator/Experiment/Run/RunController.py b/experiment-runner/ExperimentOrchestrator/Experiment/Run/RunController.py index 4e0a8e041..ce7b004a3 100644 --- a/experiment-runner/ExperimentOrchestrator/Experiment/Run/RunController.py +++ b/experiment-runner/ExperimentOrchestrator/Experiment/Run/RunController.py @@ -91,4 +91,6 @@ def do_run(self): updated_run_data = self.run_context.execute_run updated_run_data['__done'] = RunProgress.DONE - self.data_manager.update_row_data(updated_run_data) + if not self.distributed_mode: + self.data_manager.update_row_data(updated_run_data) + return updated_run_data \ No newline at end of file diff --git a/experiment-runner/Plugins/Profilers/AndroidDebugBridge.py b/experiment-runner/Plugins/Profilers/AndroidDebugBridge.py new file mode 100644 index 000000000..0157d2eac --- /dev/null +++ b/experiment-runner/Plugins/Profilers/AndroidDebugBridge.py @@ -0,0 +1,182 @@ +from pathlib import Path +from typing import Optional, Dict, Any +from enum import Enum, auto +import re +import subprocess +import csv +from datetime import datetime +import pandas as pd +import time +import threading + +from Plugins.Profilers.DataSource import DeviceSource +from ConfigValidator.Config.Models.RunnerContext import RunnerContext + + +class DataColumns(Enum): + BATTERY_PERCENTAGE = auto() + BATTERY_TEMPERATURE = auto() + BATTERY_VOLTAGE = auto() + CURRENT_NOW = auto() + CHARGE_COUNTER = auto() + BATTERY_HEALTH = auto() + CHARGING_STATUS = auto() + POWER_DRAW = auto() + + _PATTERN = re.compile(r'(android_battery__)(.+)') + + @property + def column_name(self) -> str: + return f'android_battery__{self.name.lower()}' + + +class AndroidBatteryMonitor(DeviceSource): + source_name = "adb" + supported_platforms = ["Linux", "Darwin"] + + def __init__(self, device_serial=None, poll_interval=2, out_file=Path("android_battery.csv")): + super().__init__() + + self.device_serial = device_serial + self.poll_interval = poll_interval + self.logfile = Path(out_file) + + self._validate_adb_available() + + self._thread = None + self._stop_event = threading.Event() + + def _validate_adb_available(self): + result = subprocess.run(['adb', 'version'], capture_output=True, timeout=5) + if result.returncode != 0: + raise RuntimeError("ADB version check failed.") + + def open_device(self): + if self.device_serial: + return + + result = subprocess.run(["adb", "devices"], capture_output=True, text=True, timeout=5) + devices = [ + line.split()[0] + for line in result.stdout.splitlines() + if "\tdevice" in line + ] + if not devices: + raise RuntimeError("No devices found") + + self.device_serial = devices[0] + + def close_device(self): + self.device_serial = None + + def list_devices(self): + result = subprocess.run(["adb", "devices"], capture_output=True, text=True, timeout=5) + return [ + line.split()[0] + for line in result.stdout.splitlines() + if "\tdevice" in line + ] + + def set_mode(self, settings=None): + return + + def read_sample(self): + result = subprocess.run(["adb", "-s", self.device_serial, "shell", "dumpsys battery"], capture_output=True, text=True, timeout=10) + return self._parse(result.stdout) + + def _parse(self, text): + patterns = { + "percentage": r"^\s*level:\s*(\d+)", + "temperature": r"^\s*temperature:\s*(\d+)", + "voltage": r"^\s*voltage:\s*(\d+)", + "current_now": r"^\s*current now:\s*(-?\d+)", + "charge_counter": r"^\s*charge counter:\s*(\d+)", + } + data = {} + for key, pattern in patterns.items(): + match = re.search(pattern, text, re.IGNORECASE | re.MULTILINE) + if match: + data[key] = match.group(1) + + voltage_raw = data.get("voltage") + if voltage_raw is None: + fallback = re.search(r"voltage:\s*(\d+)", text) + voltage_raw = fallback.group(1) if fallback else None + try: + voltage_v = (float(voltage_raw) / 1000.0) if voltage_raw else None + except ValueError: + voltage_v = None + + try: + current_raw = float(data.get("current_now", 0)) + current_ma = abs(current_raw) / 1000.0 + except ValueError: + current_ma = 0.0 + if voltage_v is not None: + data["voltage"] = float(voltage_raw) + data["power_draw"] = voltage_v * current_ma + else: + data["voltage"] = 0.0 + data["power_draw"] = 0.0 + + return data + + def _run(self): + self.open_device() + with open(self.logfile, "w", newline="") as f: + writer = csv.DictWriter( + f, + fieldnames=[ + "timestamp", + "percentage", + "temperature", + "voltage", + "current_now", + "charge_counter", + "power_draw" + ] + ) + writer.writeheader() + + while not self._stop_event.is_set(): + data = self.read_sample() + data["timestamp"] = datetime.now().isoformat() + writer.writerow(data) + f.flush() + time.sleep(self.poll_interval) + self.close_device() + + def log(self): + self._run() + return 0 + + def start(self): + if self._thread and self._thread.is_alive(): + raise RuntimeError("Battery monitor already running") + self._stop_event.clear() + self._thread = threading.Thread(target=self.log, name="DeviceWorker", daemon=True) + self._thread.start() + + def stop(self): + self._stop_event.set() + + if self._thread: + self._thread.join(timeout=5) + self._thread = None + + @staticmethod + def parse_log(logfile): + df = pd.read_csv(logfile) + if df.empty: + return {} + + result = {} + for col in df.columns: + if col == "timestamp": + continue + values = pd.to_numeric(df[col], errors="coerce").dropna() + + if len(values): + result[f"android_battery__{col}"] = float(values.mean()) + + return result \ No newline at end of file diff --git a/experiment-runner/Plugins/Profilers/DataSource.py b/experiment-runner/Plugins/Profilers/DataSource.py index 4d36d4991..92c0fc650 100644 --- a/experiment-runner/Plugins/Profilers/DataSource.py +++ b/experiment-runner/Plugins/Profilers/DataSource.py @@ -203,7 +203,7 @@ def _format_cmd(self): elif isinstance(v, ValueRef): cmd += f" {p} {v.value}" elif isinstance(v, Iterable) and not (isinstance(v, StrEnum) or isinstance(v, str)): - cmd += f" {p} {",".join(map(str, v))}" + cmd += f' {p} {",".join(map(str, v))}' else: cmd += f" {p} {v}" diff --git a/experiment-runner/Plugins/Profilers/PowerJoular.py b/experiment-runner/Plugins/Profilers/PowerJoular.py index 56c5aca30..a11763b99 100644 --- a/experiment-runner/Plugins/Profilers/PowerJoular.py +++ b/experiment-runner/Plugins/Profilers/PowerJoular.py @@ -42,8 +42,8 @@ def __init__(self, @property def target_logfile(self): if "-p" in self.args.keys(): - return f"{self.logfile}-{self.args["-p"]}.csv" - + return f"{self.logfile}-{self.args['-p']}.csv" + return None @staticmethod diff --git a/experiment-runner/Plugins/Profilers/WattsUpPro.py b/experiment-runner/Plugins/Profilers/WattsUpPro.py index 6cac77f7b..333b9670c 100644 --- a/experiment-runner/Plugins/Profilers/WattsUpPro.py +++ b/experiment-runner/Plugins/Profilers/WattsUpPro.py @@ -15,9 +15,11 @@ def __init__(self, port: str = None, interval=1.0): if port is None: system = uname()[0] if system == 'Darwin': # OS X - port = '/dev/tty.usbserial-A1000wT3' + #port = '/dev/tty.usbserial-A1000wT3' + port = os.getenv("WATTS_UP_PRO_PORT_MACOS", '/dev/tty.usbserial-A1000wT3') elif system == 'Linux': - port = '/dev/ttyUSB0' + #port = '/dev/ttyUSB0' + port = os.getenv("WATTS_UP_PRO_PORT_LINUX", '/dev/ttyUSB0') if not os.path.exists(port): print( '') diff --git a/experiment-runner/ProgressManager/Output/CSVOutputManager.py b/experiment-runner/ProgressManager/Output/CSVOutputManager.py index 5e3233616..20a85f79a 100644 --- a/experiment-runner/ProgressManager/Output/CSVOutputManager.py +++ b/experiment-runner/ProgressManager/Output/CSVOutputManager.py @@ -38,10 +38,14 @@ def write_run_table(self, run_table: List[Dict]): with open(self._experiment_path / 'run_table.csv', 'w', newline='') as myfile: writer = csv.DictWriter(myfile, fieldnames=list(run_table[0].keys())) writer.writeheader() + for data in run_table: - data['__done'] = data['__done'].name - writer.writerow(data) - except: + row = data.copy() + + if isinstance(row['__done'], RunProgress): + row['__done'] = row['__done'].name + writer.writerow(row) + except Exception as e: raise ExperimentOutputFileDoesNotExistError # TODO: Nice To have diff --git a/experiment-runner/ProgressManager/RunTable/Models/RunProgress.py b/experiment-runner/ProgressManager/RunTable/Models/RunProgress.py index 08231b1a9..9e0fcd5b0 100644 --- a/experiment-runner/ProgressManager/RunTable/Models/RunProgress.py +++ b/experiment-runner/ProgressManager/RunTable/Models/RunProgress.py @@ -2,4 +2,5 @@ class RunProgress(Enum): TODO = 1 - DONE = 2 \ No newline at end of file + RUNNING = 2 + DONE = 3 \ No newline at end of file diff --git a/experiment-runner/ProgressManager/Validation/AnomaliesChecker.py b/experiment-runner/ProgressManager/Validation/AnomaliesChecker.py new file mode 100644 index 000000000..ab356bdea --- /dev/null +++ b/experiment-runner/ProgressManager/Validation/AnomaliesChecker.py @@ -0,0 +1,164 @@ +from typing import Dict, List, Any, Set +import pandas as pd +from pathlib import Path +from ProgressManager.Output.OutputProcedure import OutputProcedure as output + +META_COLUMNS = { + "Delta", + "Time", + "timestamp", + "run_id" +} + +class AnomalyReport: + def __init__(self): + self.anomalies: List[Dict[str, Any]] = [] + + """ + Each anomaly detected has the following structure: + "run_id": the run where is located + "treatment_levels": the specific values of the run + "file_path": the file path + "row_number": the row where is located + "column_name": the column where is located + "value": the value + "anomaly_type": NAN or Zero or Negative Number or Missing file + """ + def add_anomaly(self, run_id: str, treatment_levels: Dict[str, Any], file_path: str, row_number: int, column_name: str, value: Any, anomaly_type: str): + self.anomalies.append({ + "run_id": run_id, + "treatment_levels": treatment_levels, + "file_path": file_path, + "row_number": row_number, + "column_name": column_name, + "value": value, + "anomaly_type": anomaly_type + }) + + def has_anomalies(self) -> bool: + return len(self.anomalies) > 0 + + +class ResultsValidator: + """ + Validates experiment output logs and detects: + - NaN values + - negative values + - zero values + - missing files + """ + @staticmethod + def _detect_numeric_columns(df: pd.DataFrame) -> List[str]: + """ + Automatically detect columns that contain numeric signals. + """ + numeric_cols = [] + + for col in df.columns: + if col in META_COLUMNS: + continue + + series = pd.to_numeric(df[col], errors="coerce") + + # keep column if it has at least some numeric values + if series.notna().any(): + numeric_cols.append(col) + return numeric_cols + + @staticmethod + def generate_report_text(report: AnomalyReport, include_header: bool = True) -> str: + lines = [] + + if include_header: + lines.append("=" * 80) + lines.append("GENERIC MEASUREMENT VALIDATION REPORT") + lines.append("=" * 80) + lines.append("") + + runs: Dict[str, List[Dict[str, Any]]] = {} + + for a in report.anomalies: + runs.setdefault(a["run_id"], []).append(a) + + for run_id, anomalies in runs.items(): + treatment = anomalies[0]["treatment_levels"] + + lines.append("-" * 80) + lines.append(f"RUN: {run_id}") + lines.append(f"TREATMENT: {treatment}") + lines.append("-" * 80) + + for a in anomalies: + lines.append( + f"[{a['anomaly_type']}] " + f"{a['column_name']} = {a['value']} " + f"(row {a['row_number']})" + ) + lines.append("") + return "\n".join(lines) + + @staticmethod + def validate_output_log(run_dir: Path,run_id: str,treatment_levels: Dict[str, Any],) -> AnomalyReport: + report = AnomalyReport() + + csv_files = list(run_dir.glob("*.csv")) + if not csv_files: + report.add_anomaly( + run_id, + treatment_levels, + str(run_dir), + -1, + "FILE_MISSING", + None, + "missing_file" + ) + return report + + csv_file = csv_files[0] + df = pd.read_csv(csv_file) + columns_to_check = ResultsValidator._detect_numeric_columns(df) + + for column in columns_to_check: + values = pd.to_numeric(df[column], errors="coerce") + for row_number, value in values.items(): + if pd.isna(value): + report.add_anomaly(run_id, treatment_levels, str(csv_file), row_number, column, value, "NaN") + elif value < 0: + report.add_anomaly(run_id, treatment_levels, str(csv_file), row_number, column, value, "negative") + elif value == 0: + report.add_anomaly(run_id, treatment_levels, str(csv_file), row_number, column, value, "zero") + return report + + @staticmethod + def update_report(report: AnomalyReport, log_file: Path): + if not report.has_anomalies(): + return + + first_report = not log_file.exists() + report_text = ResultsValidator.generate_report_text(report, include_header = first_report) + + try: + log_file.parent.mkdir(parents=True, exist_ok=True) + mode = "a" if log_file.exists() else "w" + + with open(log_file, mode) as f: + if mode == "a": + f.write("\n\n") + f.write(report_text) + output.console_log_OK(f"Results validation report updated: {log_file}") + except Exception as e: + output.console_log_FAIL(f"Failed to update results validation report: {e}") + + @staticmethod + def save_report_to_file(report: EnergyAnomalyReport, log_file: Path) -> None: + """Save validation report to a file.""" + report_text = ResultsValidator.generate_report_text(report) + + try: + log_file.parent.mkdir(parents=True, exist_ok=True) + with open(log_file, 'w') as f: + f.write(report_text) + output.console_log_OK(f"Results validation report saved to: {log_file}") + except Exception as e: + output.console_log_FAIL(f"Failed to write results validation report: {e}") + \ No newline at end of file diff --git a/experiment-runner/ProgressManager/Validation/RequirementsValidator.py b/experiment-runner/ProgressManager/Validation/RequirementsValidator.py new file mode 100644 index 000000000..c3023b183 --- /dev/null +++ b/experiment-runner/ProgressManager/Validation/RequirementsValidator.py @@ -0,0 +1,317 @@ +import sys +import ast +import os +import shutil +import importlib +import importlib.util +from pathlib import Path +from typing import List, Dict, Tuple, Optional,Set +from ConfigValidator.CustomErrors.BaseError import BaseError +from ProgressManager.Output.OutputProcedure import OutputProcedure as output + +class RequirementCheckResult: + def __init__(self, name: str, requirement_type: str): + self.name = name + self.requirement_type = requirement_type + self.installed = False + self.error_message = "" + self.version = None + + def mark_failure(self, error: str): + self.installed = False + self.error_message = error + +### ========================================================= +### | | +### | RequirementsValidator: | +### | | +### | - Checks the following requirements: | +### | - Framework requirements (python versions | +### | and packages from requirements.txt) | +### | - External tools availability in Path | +### | - Experiment-specific requirements | +### | | +### | *Validates all requirements for an | +### | experiment before execution. | +### | | +### ========================================================= +PROFILER_DEPS = { + "AndroidDebugBridge":{ + "tools": ["adb"], + "python_modules": [], + }, + "JoularCore": { + "tools": ["java"], + "python_modules": ["jpype"], + }, + "PowerJoular": { + "tools": ["java"], + "python_modules": [], + }, + "EnergiBridge": { + "tools": ["energibridge"], + "python_modules": [], + }, + "NvidiaML": { + "tools": ["nvidia-smi"], + "python_modules": ["pynvml"], + }, + "PowerMetrics": { + "tools": ["powermetrics"], + "python_modules": [], + }, + "PowerLetrics": { + "tools": ["powermetrics"], + "python_modules": [], + }, + "Ps": { + "tools": ["ps"], + "python_modules": [], + }, + "PicoCM3": { + "tools": [], + "python_modules": ["picosdk"], + }, + "CodecarbonWrapper": { + "tools": [], + "python_modules": ["codecarbon"], + }, + "WattsUpPro": { + "tools": [], + "python_modules": ["serial"], + }, +} + +class RequirementsValidator: + + def __init__(self, config_file_path: Path): + self.config_file_path = config_file_path + self.config_dir = config_file_path.parent + self.framework_root = self._find_framework_root() + self.results: List[RequirementCheckResult] = [] + self.failed_checks: List[RequirementCheckResult] = [] + + @staticmethod + def _find_framework_root() -> Path: + """Find the root of the experiment-runner framework""" + cwd = Path.cwd() + + if (cwd / 'experiment-runner').exists(): + return cwd + if (cwd / 'requirements.txt').exists(): + return cwd + + for parent in cwd.parents: + if (parent / 'experiment-runner').exists(): + return parent + if (parent / 'requirements.txt').exists(): + return parent + + return cwd + + def validate_all(self) -> bool: + """ + Run all validation checks. Returns True if all pass, False otherwise. + Raises BaseError with details if any critical checks fail. + """ + try: + # Check Python version + self._validate_python_version() + # Check requirements.txt + self._validate_framework_requirements() + # Check experiment-specific requirements + self._validate_plugin_requirements_file() + self._check_profiler_external_deps() + # Check MSR module and permissions + self._validate_msr_module() + self._validate_msr_permissions() + self._validate_perf_permissions() + + # Results + return self._report_results() + + except BaseError: + raise + except Exception as e: + raise BaseError(f"Validation error: {str(e)}") + + def _validate_perf_permissions(self): + """Check if the user has permission to access performance counters""" + + result = RequirementCheckResult("perf_event_paranoid", "system") + perf_file = Path("/proc/sys/kernel/perf_event_paranoid") + + if not perf_file.exists(): + return + + value = int(perf_file.read_text().strip()) + + if value >= 2: + result.mark_failure( + "Check Troubleshooting.md: perf_event_paranoid is too restrictive.\n" + f"perf_event_paranoid={value}\n" + "Hardware performance counters are restricted.\n" + ) + self.failed_checks.append(result) + self.results.append(result) + + def _validate_msr_module(self): + """Check if the MSR kernel module is loaded""" + + msr_path = Path("/dev/cpu/0/msr") + result = RequirementCheckResult("MSR module", "system") + if not msr_path.exists(): + result.mark_failure( + "Check Troubleshooting.md: MSR kernel module not loaded.\n" + "MSR kernel module not loaded.\n" + ) + self.failed_checks.append(result) + self.results.append(result) + + def _validate_msr_permissions(self): + """Check if the user has permission to read MSR registers""" + + result = RequirementCheckResult("MSR permissions","system") + msr_path = "/dev/cpu/0/msr" + + if not os.access(msr_path, os.R_OK): + result.mark_failure( + "Check Troubleshooting.md: No permission to read MSR registers.\n" + "No permission to read MSR registers.\n" + ) + self.failed_checks.append(result) + self.results.append(result) + + def _validate_python_version(self): + """Check Python version compatibility""" + + python_version = sys.version_info + result = RequirementCheckResult(f"Python {python_version.major}.{python_version.minor}", "system") + + # Framework requires Python 3.8+ + if python_version.major < 3 or (python_version.major == 3 and python_version.minor < 8): + result.mark_failure( + f"Python 3.8+ required. Current: {python_version.major}.{python_version.minor}" + ) + self.results.append(result) + + def _validate_framework_requirements(self): + """Check framework dependencies from requirements.txt""" + requirements_file = self.framework_root / "requirements.txt" + result = RequirementCheckResult("Framework requirements", "python_module") + + if not requirements_file.exists(): + output.console_log_WARNING(" requirements.txt not found") + return + with open(requirements_file) as f: + for line in f: + line = line.strip() + if not line or line.startswith('#'): + continue + # Parse requirement: package_name or package_name==version + package_spec = line.split('==')[0].split('>=')[0].split('<=')[0].split('>')[0].split('<')[0] + package_name = package_spec.strip() + + result = RequirementCheckResult(package_name, "python_module") + try: + # Try to import the module + module = importlib.import_module(package_name) + version = getattr(module, '__version__', 'unknown') + except ImportError as e: + result.mark_failure(f"Cannot import: {str(e)}") + self.failed_checks.append(result) + self.results.append(result) + + def _check_profiler_external_deps(self): + """Check experiment-specific plugins required""" + used_profilers = self.extract_used_profilers(self.config_file_path) + + for profiler in used_profilers: + if profiler not in PROFILER_DEPS: + output.console_log_WARNING(f"Unknown profiler '{profiler}'") + continue + + dependencies = PROFILER_DEPS[profiler] + + for tool in dependencies["tools"]: + result = RequirementCheckResult(f"{profiler}:{tool}", "system_tool") + + if not shutil.which(tool): + result.mark_failure(f"Missing tool '{tool}'") + self.failed_checks.append(result) + self.results.append(result) + + for module in dependencies["python_modules"]: + result = RequirementCheckResult(f"{profiler}:{module}", "python_module") + try: + importlib.import_module(module) + except ImportError: + result.mark_failure(f"Missing Python module '{module}'") + self.failed_checks.append(result) + self.results.append(result) + + @staticmethod + def extract_used_profilers(config_file: Path) -> list[str]: + """Extract the names of profilers used in the config file by parsing import statements.""" + profilers = [] + + with open(config_file, "r") as f: + for line in f: + line = line.strip() + + if line.startswith("from Plugins.Profilers."): + profiler = ( + line.split("from Plugins.Profilers.")[1] + .split(" import ")[0] + .strip() + ) + profilers.append(profiler) + + return profilers + + def _validate_plugin_requirements_file(self): + """Check experiment-specific dependencies from experiment's requirements.txt""" + for requirements_file in self.framework_root.rglob("requirements.txt"): + if requirements_file == self.framework_root / "requirements.txt": + continue + with open(requirements_file) as f: + for line in f: + line = line.strip() + if not line or line.startswith("#"): + continue + package_name = ( + line.split("==")[0] + .split(">=")[0] + .split("<=")[0] + .split(">")[0] + .split("<")[0] + .strip() + ) + result = RequirementCheckResult(package_name,"plugin_requirement") + + try: + importlib.import_module(package_name) + except ImportError: + result.mark_failure(f"'{package_name}' required by "f"{requirements_file} is not installed") + self.failed_checks.append(result) + self.results.append(result) + + def _report_results(self): + if self.failed_checks: + message = [] + message.append("=" * 50) + message.append("EXPERIMENT VALIDATION FAILED") + message.append("=" * 50) + + for idx, check in enumerate(self.failed_checks,start=1): + message.append("") + message.append(f"[{idx}] {check.name}") + message.append(check.error_message) + raise BaseError( + "\n".join(message) + ) + return True + +def validate_experiment_requirements(config_file_path: Path) -> bool: + validator = RequirementsValidator(config_file_path) + return validator.validate_all() \ No newline at end of file diff --git a/experiment-runner/ProgressManager/Validation/__init__.py b/experiment-runner/ProgressManager/Validation/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/experiment-runner/__main__.py b/experiment-runner/__main__.py index 3a9c97909..8078a71be 100644 --- a/experiment-runner/__main__.py +++ b/experiment-runner/__main__.py @@ -1,4 +1,5 @@ import sys +import os import traceback import dill as pickle import hashlib @@ -14,6 +15,12 @@ from ConfigValidator.CustomErrors.ConfigErrors import ConfigInvalidClassNameError from ExperimentOrchestrator.Experiment.ExperimentController import ExperimentController +from DistributedExecution.DistributedOrchestrator import DistributedOrchestrator +from DistributedExecution.Worker import WorkerRuntime + + + + def is_no_argument_given(args: List[str]): return (len(args) == 1) def is_config_file_given(args: List[str]): return (args[1][-3:] == '.py') def load_and_get_config_file_as_module(args: List[str]): @@ -24,6 +31,13 @@ def load_and_get_config_file_as_module(args: List[str]): spec.loader.exec_module(config_file) return config_file +def get_flag_value(flag: str): + if flag in sys.argv: + idx = sys.argv.index(flag) + if idx + 1 < len(sys.argv): + return sys.argv[idx + 1] + return None + def calc_ast_md5sum(src, name): tree = compile(src, name, 'exec', flags=ast.PyCF_ONLY_AST, optimize=0) @@ -41,16 +55,16 @@ def calc_ast_md5sum(src, name): # Ignore docstring if isinstance(node, (ast.AsyncFunctionDef, ast.FunctionDef, ast.ClassDef, ast.Module)) and ast.get_docstring(node) is not None: docstring_node = node.body[0].value - if isinstance(docstring_node, ast.Str): - docstring_node.s = '' - elif isinstance(docstring_node, ast.Constant) and isinstance(docstring_node.value, str): + if isinstance(docstring_node, ast.Constant) and isinstance(docstring_node.value, str): docstring_node.value = '' - return hashlib.md5(pickle.dumps(tree)).digest() if __name__ == "__main__": try: + has_distribute_flag = '--distribute' in sys.argv + has_master_url_flag = '--master-url' in sys.argv + if is_no_argument_given(sys.argv): sys.argv.append('help') CLIRegister.parse_command(sys.argv) @@ -66,7 +80,35 @@ def calc_ast_md5sum(src, name): ) ConfigValidator.validate_config(config) # Validate config as a valid RunnerConfig - ExperimentController(config, metadata).do_experiment() # Instantiate controller with config and start experiment + + if '--distribute' in sys.argv: + mode = get_flag_value('--distribute') + + if mode == "master": + master_host = get_flag_value('--host') or "0.0.0.0" + master_port = int(get_flag_value('--port') or 5000) + + orchestrator = DistributedOrchestrator( + config=config, + metadata=metadata, + host=master_host, + port=master_port + ) + orchestrator.start() + + elif mode == "worker": + master_url = get_flag_value('--master') + if not master_url: + raise BaseError("--master URL required for worker") + + agent_id = f"worker_{os.getpid()}" + + worker = WorkerRuntime(master_url) + worker.run_loop(agent_id=agent_id, config=config) + else: + raise BaseError("Invalid --distribute mode (use 'master' or 'worker')") + else: + ExperimentController(config, metadata).do_experiment() # Instantiate controller with config and start experiment else: raise ConfigInvalidClassNameError else: # Else, a utility command is entered diff --git a/requirements.txt b/requirements.txt index 32607d5be..d608c6a84 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,3 +3,6 @@ psutil tabulate dill jsonpickle +flask +requests +waitress \ No newline at end of file diff --git a/test-standalone/core/arbitrary-objects/RunnerConfig.py b/test-standalone/core/arbitrary-objects/RunnerConfig.py index 79ff0b8d9..7dcd316d9 100644 --- a/test-standalone/core/arbitrary-objects/RunnerConfig.py +++ b/test-standalone/core/arbitrary-objects/RunnerConfig.py @@ -10,6 +10,7 @@ from typing import Dict, List, Any, Optional from pathlib import Path from os.path import dirname, realpath +import os ''' Test Description: @@ -37,7 +38,8 @@ class RunnerConfig: # ================================ USER SPECIFIC CONFIG ================================ name: str = "new_runner_experiment" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) operation_type: OperationType = OperationType.AUTO time_between_runs_in_ms: int = 100 diff --git a/test-standalone/core/shuffling/RunnerConfig.py b/test-standalone/core/shuffling/RunnerConfig.py index 16095eb17..7e0116904 100644 --- a/test-standalone/core/shuffling/RunnerConfig.py +++ b/test-standalone/core/shuffling/RunnerConfig.py @@ -23,7 +23,8 @@ class RunnerConfig: # ================================ USER SPECIFIC CONFIG ================================ name: str = "new_runner_experiment" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) operation_type: OperationType = OperationType.AUTO time_between_runs_in_ms: int = 100 diff --git a/test-standalone/plugins/CodecarbonWrapper/combined/RunnerConfig.py b/test-standalone/plugins/CodecarbonWrapper/combined/RunnerConfig.py index e626929a2..d3d52db71 100644 --- a/test-standalone/plugins/CodecarbonWrapper/combined/RunnerConfig.py +++ b/test-standalone/plugins/CodecarbonWrapper/combined/RunnerConfig.py @@ -29,7 +29,8 @@ class RunnerConfig: # ================================ USER SPECIFIC CONFIG ================================ name: str = "new_runner_experiment" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) operation_type: OperationType = OperationType.AUTO time_between_runs_in_ms: int = 100 diff --git a/test-standalone/plugins/CodecarbonWrapper/individual/RunnerConfig.py b/test-standalone/plugins/CodecarbonWrapper/individual/RunnerConfig.py index 0d00ff5f8..c00723944 100644 --- a/test-standalone/plugins/CodecarbonWrapper/individual/RunnerConfig.py +++ b/test-standalone/plugins/CodecarbonWrapper/individual/RunnerConfig.py @@ -25,7 +25,8 @@ class RunnerConfig: # ================================ USER SPECIFIC CONFIG ================================ name: str = "new_runner_experiment" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) operation_type: OperationType = OperationType.AUTO time_between_runs_in_ms: int = 100 diff --git a/test-standalone/plugins/PicoCM3/RunnerConfig.py b/test-standalone/plugins/PicoCM3/RunnerConfig.py index 9b3d580cf..30606f702 100644 --- a/test-standalone/plugins/PicoCM3/RunnerConfig.py +++ b/test-standalone/plugins/PicoCM3/RunnerConfig.py @@ -20,7 +20,8 @@ class RunnerConfig: # ================================ USER SPECIFIC CONFIG ================================ name: str = "new_runner_experiment" - results_output_path: Path = ROOT_DIR / 'experiments' + default_output = ROOT_DIR / "experiments" + results_output_path: Path = Path(os.getenv("EXPERIMENT_RUNNER_OUTPUT_PATH", str(default_output))) operation_type: OperationType = OperationType.AUTO time_between_runs_in_ms: int = 1000 diff --git a/test/ExperimentOrchestrator/__init__.py b/test/ExperimentOrchestrator/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/test/ExperimentOrchestrator/test_RemoteDistribution.py b/test/ExperimentOrchestrator/test_RemoteDistribution.py new file mode 100644 index 000000000..7bfb93d18 --- /dev/null +++ b/test/ExperimentOrchestrator/test_RemoteDistribution.py @@ -0,0 +1,493 @@ +import unittest +import tempfile +import shutil +import sys +from pathlib import Path +from typing import AnyStr, List, Dict, Any + +sys.path.insert(0, "experiment-runner") + +from ConfigValidator.Config.Models.RunnerContext import RunnerContext +from ConfigValidator.Config.Models.FactorModel import FactorModel +from ConfigValidator.Config.Models.RunTableModel import RunTableModel +from ConfigValidator.Config.RunnerConfig import RunnerConfig +from ProgressManager.Output.OutputProcedure import OutputProcedure as output + + +class RemoteAgent: + """Mock remote agent for testing distributed execution""" + def __init__(self, agent_id: str, host: str, port: int): + self.agent_id = agent_id + self.host = host + self.port = port + self.is_connected = False + self.assigned_runs: List[Dict] = [] + self.completed_runs: List[Dict] = [] + self.failed_runs: List[str] = [] + + def connect(self) -> bool: + """Simulate connection to remote agent""" + if not self.host or self.port <= 0: + return False + self.is_connected = True + return True + + def disconnect(self) -> bool: + """Disconnect from remote agent""" + self.is_connected = False + return True + + def send_run(self, run_data: Dict) -> bool: + """Send a run to the remote agent for execution""" + if not self.is_connected: + return False + self.assigned_runs.append(run_data) + return True + + def retrieve_results(self) -> List[Dict]: + """Retrieve completed run results from remote agent""" + return self.completed_runs.copy() + + def mark_run_complete(self, run_id: str, result_data: Dict) -> bool: + """Mark a run as completed on the remote agent""" + result_data['__run_id'] = run_id + self.completed_runs.append(result_data) + self.assigned_runs = [r for r in self.assigned_runs if r.get('__run_id') != run_id] + return True + + def mark_run_failed(self, run_id: str, error_message: str) -> bool: + """Mark a run as failed""" + self.failed_runs.append(run_id) + self.assigned_runs = [r for r in self.assigned_runs if r.get('__run_id') != run_id] + return True + + +class RemoteDistributionManager: + """Manages distribution of experiments across remote agents""" + def __init__(self): + self.agents: Dict[str, RemoteAgent] = {} + self.pending_runs: List[Dict] = [] + self.completed_runs: List[Dict] = [] + self.failed_runs: Dict[str, str] = {} + + def register_agent(self, agent: RemoteAgent) -> bool: + """Register a new remote agent""" + if not isinstance(agent, RemoteAgent): + return False + self.agents[agent.agent_id] = agent + return True + + def connect_all_agents(self) -> Dict[str, bool]: + """Connect to all registered agents""" + results = {} + for agent_id, agent in self.agents.items(): + results[agent_id] = agent.connect() + return results + + def disconnect_all_agents(self) -> Dict[str, bool]: + """Disconnect from all agents""" + results = {} + for agent_id, agent in self.agents.items(): + results[agent_id] = agent.disconnect() + return results + + def distribute_runs(self, runs: List[Dict]) -> Dict[str, int]: + """Distribute runs across available agents using round-robin""" + self.pending_runs = runs.copy() + agent_ids = list(self.agents.keys()) + + if not agent_ids: + self.failed_runs.update({r.get('__run_id'): 'No agents available' for r in runs}) + return {'distributed': 0, 'failed': len(runs)} + + distributed = 0 + failed = 0 + + for idx, run in enumerate(runs): + agent_id = agent_ids[idx % len(agent_ids)] + agent = self.agents[agent_id] + + if agent.send_run(run): + distributed += 1 + else: + self.failed_runs[run.get('__run_id')] = f'Failed to send to agent {agent_id}' + failed += 1 + + return {'distributed': distributed, 'failed': failed} + + def collect_results(self) -> Dict[str, Any]: + """Collect results from all agents""" + for agent in self.agents.values(): + self.completed_runs.extend(agent.retrieve_results()) + + return { + 'total_completed': len(self.completed_runs), + 'total_failed': len(self.failed_runs), + 'results': self.completed_runs + } + + def get_agent_status(self) -> Dict[str, Dict]: + """Get status of all agents""" + status = {} + for agent_id, agent in self.agents.items(): + status[agent_id] = { + 'connected': agent.is_connected, + 'assigned_runs': len(agent.assigned_runs), + 'completed_runs': len(agent.completed_runs), + 'failed_runs': len(agent.failed_runs), + 'host': agent.host, + 'port': agent.port + } + return status + + +class TestRemoteAgentBasic(unittest.TestCase): + """Test basic remote agent functionality""" + + def setUp(self): + self.agent = RemoteAgent( + agent_id="test_agent_1", + host="localhost", + port=8000 + ) + + def test_agent_initialization(self): + """Test that agent is properly initialized""" + self.assertEqual(self.agent.agent_id, "test_agent_1") + self.assertEqual(self.agent.host, "localhost") + self.assertEqual(self.agent.port, 8000) + self.assertFalse(self.agent.is_connected) + + def test_agent_connection(self): + """Test connecting to remote agent""" + self.assertFalse(self.agent.is_connected) + connected = self.agent.connect() + self.assertTrue(connected) + self.assertTrue(self.agent.is_connected) + + def test_agent_disconnection(self): + """Test disconnecting from remote agent""" + self.agent.connect() + self.assertTrue(self.agent.is_connected) + disconnected = self.agent.disconnect() + self.assertTrue(disconnected) + self.assertFalse(self.agent.is_connected) + + def test_invalid_agent_connection(self): + """Test connection failure with invalid parameters""" + invalid_agent = RemoteAgent("invalid", "", -1) + connected = invalid_agent.connect() + self.assertFalse(connected) + self.assertFalse(invalid_agent.is_connected) + + +class TestRemoteAgentRunManagement(unittest.TestCase): + """Test run management on remote agents""" + + def setUp(self): + self.agent = RemoteAgent("test_agent", "localhost", 8000) + self.agent.connect() + self.test_run = { + '__run_id': 'run_1', + 'factor1': 'treatment1', + 'factor2': 'value1' + } + + def tearDown(self): + self.agent.disconnect() + + def test_send_run_when_connected(self): + """Test sending a run to connected agent""" + result = self.agent.send_run(self.test_run) + self.assertTrue(result) + self.assertEqual(len(self.agent.assigned_runs), 1) + self.assertEqual(self.agent.assigned_runs[0]['__run_id'], 'run_1') + + def test_send_run_when_disconnected(self): + """Test that runs cannot be sent to disconnected agent""" + self.agent.disconnect() + result = self.agent.send_run(self.test_run) + self.assertFalse(result) + self.assertEqual(len(self.agent.assigned_runs), 0) + + def test_mark_run_complete(self): + """Test marking a run as completed""" + self.agent.send_run(self.test_run) + result_data = {'result_col': 42.5} + + success = self.agent.mark_run_complete('run_1', result_data) + self.assertTrue(success) + self.assertEqual(len(self.agent.completed_runs), 1) + self.assertEqual(self.agent.completed_runs[0]['result_col'], 42.5) + self.assertEqual(self.agent.completed_runs[0]['__run_id'], 'run_1') + self.assertEqual(len(self.agent.assigned_runs), 0) + + def test_mark_run_failed(self): + """Test marking a run as failed""" + self.agent.send_run(self.test_run) + + success = self.agent.mark_run_failed('run_1', 'Timeout error') + self.assertTrue(success) + self.assertEqual(len(self.agent.failed_runs), 1) + self.assertIn('run_1', self.agent.failed_runs) + self.assertEqual(len(self.agent.assigned_runs), 0) + + def test_retrieve_results(self): + """Test retrieving results from agent""" + self.agent.send_run(self.test_run) + self.agent.mark_run_complete('run_1', {'data': 100}) + + results = self.agent.retrieve_results() + self.assertEqual(len(results), 1) + self.assertEqual(results[0]['data'], 100) + + +class TestRemoteDistributionManagerBasic(unittest.TestCase): + """Test basic distribution manager functionality""" + + def setUp(self): + self.manager = RemoteDistributionManager() + self.agent1 = RemoteAgent("agent_1", "host1.local", 8000) + self.agent2 = RemoteAgent("agent_2", "host2.local", 8001) + + def test_manager_initialization(self): + """Test distribution manager initialization""" + self.assertEqual(len(self.manager.agents), 0) + self.assertEqual(len(self.manager.pending_runs), 0) + + def test_register_agents(self): + """Test registering remote agents""" + success1 = self.manager.register_agent(self.agent1) + success2 = self.manager.register_agent(self.agent2) + + self.assertTrue(success1) + self.assertTrue(success2) + self.assertEqual(len(self.manager.agents), 2) + + def test_register_invalid_agent(self): + """Test that invalid objects cannot be registered""" + result = self.manager.register_agent("not_an_agent") + self.assertFalse(result) + + def test_connect_all_agents(self): + """Test connecting all registered agents""" + self.manager.register_agent(self.agent1) + self.manager.register_agent(self.agent2) + + results = self.manager.connect_all_agents() + + self.assertEqual(results['agent_1'], True) + self.assertEqual(results['agent_2'], True) + self.assertTrue(self.agent1.is_connected) + self.assertTrue(self.agent2.is_connected) + + def test_disconnect_all_agents(self): + """Test disconnecting all agents""" + self.manager.register_agent(self.agent1) + self.manager.register_agent(self.agent2) + self.manager.connect_all_agents() + + results = self.manager.disconnect_all_agents() + + self.assertEqual(results['agent_1'], True) + self.assertEqual(results['agent_2'], True) + self.assertFalse(self.agent1.is_connected) + self.assertFalse(self.agent2.is_connected) + + +class TestDistributionAlgorithms(unittest.TestCase): + """Test distribution algorithms""" + + def setUp(self): + self.manager = RemoteDistributionManager() + self.agent1 = RemoteAgent("agent_1", "host1.local", 8000) + self.agent2 = RemoteAgent("agent_2", "host2.local", 8001) + + self.manager.register_agent(self.agent1) + self.manager.register_agent(self.agent2) + self.manager.connect_all_agents() + + def test_round_robin_distribution(self): + """Test round-robin distribution across agents""" + runs = [ + {'__run_id': f'run_{i}', 'factor': i} + for i in range(6) + ] + + distribution = self.manager.distribute_runs(runs) + + self.assertEqual(distribution['distributed'], 6) + self.assertEqual(distribution['failed'], 0) + self.assertEqual(len(self.agent1.assigned_runs), 3) + self.assertEqual(len(self.agent2.assigned_runs), 3) + + def test_distribution_to_single_agent(self): + """Test distribution when only one agent is available""" + single_manager = RemoteDistributionManager() + single_manager.register_agent(self.agent1) + single_manager.connect_all_agents() + + runs = [ + {'__run_id': f'run_{i}', 'factor': i} + for i in range(4) + ] + + distribution = single_manager.distribute_runs(runs) + + self.assertEqual(distribution['distributed'], 4) + self.assertEqual(len(self.agent1.assigned_runs), 4) + + def test_distribution_with_no_agents(self): + """Test distribution fails gracefully with no agents""" + empty_manager = RemoteDistributionManager() + + runs = [{'__run_id': 'run_1', 'factor': 1}] + distribution = empty_manager.distribute_runs(runs) + + self.assertEqual(distribution['distributed'], 0) + self.assertEqual(distribution['failed'], 1) + + +class TestResultAggregation(unittest.TestCase): + """Test result aggregation from multiple agents""" + + def setUp(self): + self.manager = RemoteDistributionManager() + self.agent1 = RemoteAgent("agent_1", "host1.local", 8000) + self.agent2 = RemoteAgent("agent_2", "host2.local", 8001) + + self.manager.register_agent(self.agent1) + self.manager.register_agent(self.agent2) + self.manager.connect_all_agents() + + def test_collect_results_from_multiple_agents(self): + """Test collecting results from all agents""" + runs = [ + {'__run_id': f'run_{i}', 'factor': i} + for i in range(4) + ] + + self.manager.distribute_runs(runs) + + self.agent1.mark_run_complete('run_0', {'result': 100}) + self.agent1.mark_run_complete('run_2', {'result': 150}) + self.agent2.mark_run_complete('run_1', {'result': 120}) + self.agent2.mark_run_complete('run_3', {'result': 180}) + + aggregation = self.manager.collect_results() + + self.assertEqual(aggregation['total_completed'], 4) + self.assertEqual(aggregation['total_failed'], 0) + self.assertEqual(len(aggregation['results']), 4) + + def test_agent_status_reporting(self): + """Test getting status of all agents""" + runs = [ + {'__run_id': f'run_{i}', 'factor': i} + for i in range(4) + ] + + self.manager.distribute_runs(runs) + self.agent1.mark_run_complete('run_0', {'result': 100}) + + status = self.manager.get_agent_status() + + self.assertEqual(status['agent_1']['assigned_runs'], 1) + self.assertEqual(status['agent_1']['completed_runs'], 1) + self.assertEqual(status['agent_2']['assigned_runs'], 2) + self.assertEqual(status['agent_2']['completed_runs'], 0) + + +class RemoteDistributionTestConfig(RunnerConfig): + """Test configuration for remote distribution experiments""" + + tmpdir: AnyStr = tempfile.mkdtemp() + + def clear(self): + if Path(self.__class__.tmpdir).exists(): + shutil.rmtree(self.__class__.tmpdir) + + def create_run_table_model(self): + return RunTableModel( + factors=[ + FactorModel("algorithm", ["quicksort", "mergesort", "heapsort"]), + FactorModel("data_size", [100, 1000, 10000]), + ], + data_columns=['execution_time', 'memory_used'] + ) + + def start_measurement(self, context: RunnerContext): + output.console_log("RemoteDistribution: Starting measurement") + pass + + def interact(self, context: RunnerContext): + output.console_log("RemoteDistribution: Executing on remote agent") + pass + + def stop_measurement(self, context: RunnerContext): + output.console_log("RemoteDistribution: Stopping measurement") + pass + + def populate_run_data(self, context: RunnerContext): + output.console_log("RemoteDistribution: Populating run data") + return { + 'execution_time': 1.5, + 'memory_used': 512 + } + + +class TestRemoteDistributionIntegration(unittest.TestCase): + """Integration tests for remote distribution with RunnerConfig""" + + def setUp(self): + self.config = RemoteDistributionTestConfig() + self.run_table = self.config.create_run_table_model().generate_experiment_run_table() + + def tearDown(self): + self.config.clear() + + def test_config_with_remote_distribution(self): + """Test that config works with remote distribution""" + self.config.start_measurement(None) + self.config.interact(None) + self.config.stop_measurement(None) + run_data = self.config.populate_run_data(None) + + self.assertIsNotNone(run_data) + self.assertEqual(run_data['execution_time'], 1.5) + self.assertEqual(run_data['memory_used'], 512) + + def test_run_table_generation_for_distribution(self): + """Test that run table can be properly distributed""" + self.assertGreater(len(self.run_table), 0) + + for run in self.run_table: + self.assertIn('__run_id', run) + self.assertIn('algorithm', run) + self.assertIn('data_size', run) + self.assertIn('execution_time', run) + self.assertIn('memory_used', run) + + def test_distributed_execution_simulation(self): + """Test simulating distributed execution of experiment""" + manager = RemoteDistributionManager() + agent1 = RemoteAgent("agent_1", "localhost", 8000) + agent2 = RemoteAgent("agent_2", "localhost", 8001) + + manager.register_agent(agent1) + manager.register_agent(agent2) + manager.connect_all_agents() + + distribution = manager.distribute_runs(self.run_table) + self.assertEqual(distribution['distributed'], len(self.run_table)) + + for run in self.run_table: + run_id = run['__run_id'] + manager.agents['agent_1'].mark_run_complete(run_id, self.config.populate_run_data(None)) + + aggregation = manager.collect_results() + self.assertGreater(aggregation['total_completed'], 0) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/Plugins/Profilers/test_AndroidDebugBridge.py b/test/Plugins/Profilers/test_AndroidDebugBridge.py new file mode 100644 index 000000000..dcd9d7ac2 --- /dev/null +++ b/test/Plugins/Profilers/test_AndroidDebugBridge.py @@ -0,0 +1,148 @@ +import unittest +import tempfile +import shutil +import sys +import time +import pandas as pd +from pathlib import Path +from unittest.mock import patch, MagicMock + +sys.path.append("experiment-runner") + +from Plugins.Profilers.AndroidDebugBridge import AndroidBatteryMonitor + +class TestADBMonitorLoop(unittest.TestCase): + def setUp(self): + self.tmpdir = tempfile.mkdtemp() + def tearDown(self): + shutil.rmtree(self.tmpdir) + def fake_subprocess(self, *args, **kwargs): + cmd = args[0] + mock = MagicMock() + mock.returncode = 0 + # CASE 1: adb devices + if "devices" in cmd: + mock.stdout = "emulator-5554\tdevice\n" + return mock + # CASE 2: dumpsys battery + mock.stdout = """ + level: 75 + temperature: 315 + voltage: 4100 + current now: -900000 + charge counter: 3810000 + """ + return mock + + def test_start_stop_monitor(self): + with patch("subprocess.run", side_effect=self.fake_subprocess): + monitor = AndroidBatteryMonitor( + out_file=Path(self.tmpdir) / "android_battery.csv", + poll_interval=1 + ) + + monitor.start() + time.sleep(3) + monitor.stop() + + csv_path = Path(self.tmpdir) / "android_battery.csv" + self.assertTrue(csv_path.exists()) + + df = pd.read_csv(csv_path) + + self.assertFalse(df.empty) + self.assertIn("voltage", df.columns) + self.assertIn("power_draw", df.columns) + print(df.head()) + +class FakeBatteryDevice: + """ + Deterministic battery simulator. + Mimics Android dumpsys battery output. + """ + def __init__(self): + self.level = 80 + self.voltage = 4200 + self.temperature = 310 + self.current_now = -900000 # µA + self.charge_counter = 3810000 + self.tick = 0 + + def step(self): + """ + Simulate time passing. + """ + self.tick += 1 + + # battery slowly drains + if self.tick % 2 == 0: + self.level = max(0, self.level - 1) + # voltage drops slightly with battery level + self.voltage = 4200 - (80 - self.level) * 2 + # current fluctuates slightly + self.current_now = -900000 - (self.tick * 1000) + + def dumpsys(self): + self.step() + return f""" + level: {self.level} + temperature: {self.temperature} + voltage: {self.voltage} + current now: {self.current_now} + charge counter: {self.charge_counter} + """ + +class FakeADB: + def __init__(self, device: FakeBatteryDevice): + self.device = device + + def run(self, cmd, *args, **kwargs): + mock = MagicMock() + mock.returncode = 0 + cmd_str = " ".join(cmd) + # adb devices + if "devices" in cmd_str: + mock.stdout = "emulator-5554\tdevice\n" + return mock + + # dumpsys battery + if "dumpsys battery" in cmd_str: + mock.stdout = self.device.dumpsys() + return mock + + mock.stdout = "" + return mock + +class TestDeterministicBattery(unittest.TestCase): + + def setUp(self): + self.tmpdir = tempfile.mkdtemp() + + def tearDown(self): + shutil.rmtree(self.tmpdir) + + def test_monitor(self): + device = FakeBatteryDevice() + fake_adb = FakeADB(device) + + def patched_run(cmd, *args, **kwargs): + return fake_adb.run(cmd, *args, **kwargs) + + with patch("subprocess.run", side_effect=patched_run): + monitor = AndroidBatteryMonitor( + out_file=Path(self.tmpdir) / "battery.csv", + poll_interval=1 + ) + monitor.start() + + time.sleep(4) + monitor.stop() + + df = pd.read_csv(Path(self.tmpdir) / "battery.csv") + + # deterministic checks + self.assertGreater(len(df), 2) + self.assertIn("voltage", df.columns) + self.assertIn("power_draw", df.columns) + self.assertTrue(df["voltage"].iloc[-1] <= df["voltage"].iloc[0]) + print(df) \ No newline at end of file diff --git a/test/ProgressManager/__init__.py b/test/ProgressManager/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/test/ProgressManager/test_AnomaliesChecker.py b/test/ProgressManager/test_AnomaliesChecker.py new file mode 100644 index 000000000..f97a795cf --- /dev/null +++ b/test/ProgressManager/test_AnomaliesChecker.py @@ -0,0 +1,138 @@ +import unittest +import tempfile +from pathlib import Path +import pandas as pd +import sys + +sys.path.append("experiment-runner") + +from ProgressManager.Validation.AnomaliesChecker import ResultsValidator, AnomalyReport + +class TestAnomaliesChecker(unittest.TestCase): + def create_run_folder(self, df): + tmpdir = tempfile.TemporaryDirectory() + run_dir = Path(tmpdir.name) + csv_file = run_dir / "energibridge.csv" + df.to_csv(csv_file, index=False) + return tmpdir, run_dir + + def test_positive_values(self): + df = pd.DataFrame({ + "CPU_ENERGY (J)": [10, 12, 15], + "CORE0_ENERGY (J)": [1.5, 1.7, 2.0] + }) + tmpdir, run_dir = self.create_run_folder(df) + report = ResultsValidator.validate_output_log( + run_dir, + "run_1", + {"workload": "light"} + ) + self.assertFalse(report.has_anomalies()) + tmpdir.cleanup() + + def test_zero_value(self): + df = pd.DataFrame({ + "CPU_ENERGY (J)": [10, 0, 15] + }) + tmpdir, run_dir = self.create_run_folder(df) + report = ResultsValidator.validate_output_log( + run_dir, + "run_1", + {"workload": "light"} + ) + self.assertTrue(report.has_anomalies()) + self.assertEqual(report.anomalies[0]["anomaly_type"], "zero") + tmpdir.cleanup() + + def test_negative_value(self): + df = pd.DataFrame({ + "CPU_ENERGY (J)": [10, -5, 15] + }) + tmpdir, run_dir = self.create_run_folder(df) + report = ResultsValidator.validate_output_log( + run_dir, + "run_1", + {"workload": "medium"} + ) + self.assertTrue(report.has_anomalies()) + self.assertEqual(report.anomalies[0]["anomaly_type"], "negative") + tmpdir.cleanup() + + def test_nan_value(self): + df = pd.DataFrame({ + "CPU_ENERGY (J)": [10, None, 15] + }) + tmpdir, run_dir = self.create_run_folder(df) + report = ResultsValidator.validate_output_log( + run_dir, + "run_1", + {"workload": "heavy"} + ) + self.assertTrue(report.has_anomalies()) + self.assertEqual(report.anomalies[0]["anomaly_type"], "NaN") + + tmpdir.cleanup() + + def test_missing_file(self): + tmpdir = tempfile.TemporaryDirectory() + run_dir = Path(tmpdir.name) + report = ResultsValidator.validate_output_log( + run_dir, + "run_1", + {"workload": "light"} + ) + self.assertTrue(report.has_anomalies()) + self.assertEqual(report.anomalies[0]["anomaly_type"], "missing_file") + tmpdir.cleanup() + + def test_generate_report(self): + tmpdir = tempfile.TemporaryDirectory() + experiment_path = Path(tmpdir.name) + + run0 = experiment_path / "run_0" + run0.mkdir() + + pd.DataFrame({ + "CPU_ENERGY (J)": [10, 0, 15] + }).to_csv(run0 / "energibridge.csv", index=False) + + run1 = experiment_path / "run_1" + run1.mkdir() + + pd.DataFrame({ + "CPU_ENERGY (J)": [10, -5, 15] + }).to_csv(run1 / "energibridge.csv", index=False) + + run_table = [ + {"__run_id": "run_0", "workload": "light", "brightness": "low"}, + {"__run_id": "run_1", "workload": "heavy", "brightness": "high"} + ] + final_report = AnomalyReport() + + for run in run_table: + run_id = run["__run_id"] + treatment_levels = { + k: v for k, v in run.items() + if not k.startswith("__") + } + run_dir = experiment_path / run_id + run_report = ResultsValidator.validate_output_log( + run_dir, + run_id, + treatment_levels + ) + final_report.anomalies.extend(run_report.anomalies) + + self.assertTrue(final_report.has_anomalies()) + log_file = experiment_path / "energibridge.log" + ResultsValidator.save_report_to_file( + final_report, + log_file + ) + self.assertTrue(log_file.exists()) + print(log_file.read_text()) + tmpdir.cleanup() + + +if __name__ == "__main__": + unittest.main() \ No newline at end of file diff --git a/test/conftest.py b/test/conftest.py new file mode 100644 index 000000000..1e789bc16 --- /dev/null +++ b/test/conftest.py @@ -0,0 +1,168 @@ +""" +Global pytest configuration and shared fixtures for ALL tests + +This file is automatically discovered by pytest and runs before any tests. +It contains: +1. Pytest plugins and configuration +2. Shared fixtures (reusable setup/teardown) +3. Hooks for test execution + +WHY THIS MATTERS: +- Fixtures replace traditional setUp()/tearDown() methods +- Fixtures are more flexible: can be scoped (function, class, module, session) +- Shared fixtures prevent code duplication across test files +- conftest.py is the standard pytest way to organize test utilities +""" + +import pytest +import tempfile +import shutil +from pathlib import Path +import sys +import os + +# Add experiment-runner to Python path so tests can import it +PROJECT_ROOT = Path(__file__).parent.parent +sys.path.insert(0, str(PROJECT_ROOT)) +sys.path.insert(0, str(PROJECT_ROOT / "experiment-runner")) + + +# ============================================================================ +# FIXTURES - Reusable setup/teardown for tests +# ============================================================================ +# A fixture is like a setUp() method that runs before each test +# Think of it as: "Here's the environment my test needs" + +@pytest.fixture +def temp_dir(): + """ + Fixture: Create a temporary directory for test files + + SCOPE: "function" means a new temp directory for EACH test function + + WHY: Tests need isolated environments so they don't interfere with each other + One test shouldn't modify another test's files + + USAGE in tests: + def test_something(temp_dir): + # temp_dir is a Path object pointing to a fresh temporary directory + config_file = temp_dir / "RunnerConfig.py" + config_file.write_text("...") + + CLEANUP: Automatically deleted after test completes (yield statement does this) + """ + tmpdir = Path(tempfile.mkdtemp()) + yield tmpdir # "yield" = pause here, run test, resume after + # After test completes, cleanup happens below + if tmpdir.exists(): + shutil.rmtree(tmpdir) + + +@pytest.fixture +def experiment_output_dir(temp_dir): + """ + Fixture: Create directory structure expected by Experiment Runner + + This prepares a directory that Experiment Runner can write results to. + + STRUCTURE: + temp_dir/ + └── experiments/ <- Where results go + └── my_experiment/ <- One folder per experiment + ├── run_table.csv + ├── metadata.json + └── run_0_repetition_0/ + """ + results_dir = temp_dir / "experiments" + results_dir.mkdir(parents=True, exist_ok=True) + return results_dir + + +@pytest.fixture +def env_vars_clean(): + """ + Fixture: Clean environment variables before/after test + + WHY: Some tests rely on environment variables (like EXPERIMENT_RUNNER_OUTPUT_PATH) + We want a clean state so tests don't affect each other + + This saves original values, provides clean environment, then restores + """ + # Save original environment + original_env = os.environ.copy() + + yield # Run test with clean environment + + # Restore original environment after test + os.environ.clear() + os.environ.update(original_env) + + +# ============================================================================ +# PYTEST CONFIGURATION +# ============================================================================ + +def pytest_configure(config): + """ + Hook: Runs once when pytest starts + + We use this to register custom markers (test categories) + """ + config.addinivalue_line( + "markers", + "system: System-level tests (real experiment execution)" + ) + config.addinivalue_line( + "markers", + "integration: Integration tests (multiple components)" + ) + config.addinivalue_line( + "markers", + "unit: Unit tests (single component with mocks)" + ) + config.addinivalue_line( + "markers", + "slow: Tests that take a while to run" + ) + + +def pytest_collection_modifyitems(config, items): + """ + Hook: Runs after tests are discovered, before they run + + This automatically assigns markers based on test location + """ + for item in items: + # If test is in system/, mark it as @pytest.mark.system + if "system" in str(item.fspath): + item.add_marker(pytest.mark.system) + elif "integration" in str(item.fspath): + item.add_marker(pytest.mark.integration) + elif "unit" in str(item.fspath): + item.add_marker(pytest.mark.unit) + + +# ============================================================================ +# PYTEST COMMAND LINE OPTIONS +# ============================================================================ +# These let users run specific test types: +# pytest -m system (only system tests) +# pytest -m unit (only unit tests) +# pytest -k shuffling (only tests with "shuffling" in name) + +def pytest_addoption(parser): + """ + Hook: Allows passing custom command line options to pytest + """ + parser.addoption( + "--real-profilers", + action="store_true", + default=False, + help="Run tests that require real profiler installations" + ) + parser.addoption( + "--skip-slow", + action="store_true", + default=False, + help="Skip slow system tests" + ) diff --git a/test/integration/__init__.py b/test/integration/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/test/system/__init__.py b/test/system/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/test/system/base_system_test.py b/test/system/base_system_test.py new file mode 100644 index 000000000..09a47651d --- /dev/null +++ b/test/system/base_system_test.py @@ -0,0 +1,345 @@ +""" +Base class for System-Level Tests + +What is a "System Test"? +- Runs the ACTUAL experiment (not mocked) +- Tests real profilers, real data collection +- Validates end-to-end workflow +- Catches integration issues + +This base class provides reusable methods for all system tests so we don't +repeat code in every test file. + +INHERITANCE EXAMPLE: + class TestBasicExperiment(SystemExperimentTest): + def test_hello_world_runs(self, temp_dir): + # Use inherited methods like self.run_experiment() + result = self.run_experiment("hello-world", temp_dir) + assert result.success +""" + +import subprocess +import sys +from pathlib import Path +from typing import Dict, Optional, List +import pytest + + +class ExperimentResult: + """ + Container for experiment execution results + + WHY: Instead of returning a tuple, we return an object with named fields + This is clearer: result.success vs result[0] + """ + def __init__(self, success: bool, stdout: str, stderr: str, + results_dir: Path, config_path: Path): + self.success = success # Did experiment complete? + self.stdout = stdout # Console output + self.stderr = stderr # Error output + self.results_dir = results_dir # Where results were written + self.config_path = config_path # Which config was used + + +class SystemExperimentTest: + """ + Base class for ALL system-level tests + + Think of this as a "helper class" that all system tests inherit from. + It provides common methods so we don't repeat code. + + EXAMPLE USAGE: + class TestProfilers(SystemExperimentTest): + def test_picoCM3_experiment(self, temp_dir): + # Call inherited method from this class + result = self.run_experiment( + config_name="test-standalone/plugins/PicoCM3", + results_dir=temp_dir + ) + assert result.success + self.validate_csv_output(result.results_dir) + """ + + # ======================================================================== + # SETUP METHODS + # ======================================================================== + + def run_experiment( + self, + config_path: str, + results_dir: Path, + timeout: int = 300 + ) -> ExperimentResult: + """ + Execute an actual experiment using Experiment Runner + + This runs: python experiment-runner/ + + PARAMETERS: + config_path: Relative or absolute path to RunnerConfig.py + results_dir: Where to store results + timeout: Maximum seconds to wait (default 5 min) + + RETURNS: + ExperimentResult object with success, stdout, stderr, etc. + + WHY NOT JUST CALL subprocess DIRECTLY? + - Encapsulation: If how we run experiments changes, update here once + - Reusability: All tests use same execution method + - Error handling: Consistent error reporting + + EXAMPLE: + result = self.run_experiment("examples/hello-world", temp_dir) + if not result.success: + print(result.stderr) # Show what went wrong + """ + project_root = Path(__file__).parent.parent.parent + config_file = Path(config_path) + + if not config_file.is_absolute(): + config_file = project_root / config_path / "RunnerConfig.py" + + # Build the command: python experiment-runner/ + cmd = [ + sys.executable, + str(project_root / "experiment-runner" / "__main__.py"), + str(config_file) + ] + + try: + # Run the command and capture output + result = subprocess.run( + cmd, + capture_output=True, # Capture stdout/stderr + text=True, # Return as strings, not bytes + timeout=timeout, + cwd=str(project_root) + ) + + # Experiment was successful if return code is 0 + success = result.returncode == 0 + + return ExperimentResult( + success=success, + stdout=result.stdout, + stderr=result.stderr, + results_dir=results_dir, + config_path=config_file + ) + + except subprocess.TimeoutExpired: + # Experiment took too long + return ExperimentResult( + success=False, + stdout="", + stderr=f"Experiment timed out after {timeout} seconds", + results_dir=results_dir, + config_path=config_file + ) + except Exception as e: + # Something went wrong executing the command + return ExperimentResult( + success=False, + stdout="", + stderr=f"Failed to run experiment: {str(e)}", + results_dir=results_dir, + config_path=config_file + ) + + + # ======================================================================== + # VALIDATION METHODS + # ======================================================================== + + def validate_csv_output(self, experiment_dir: Path) -> bool: + """ + Validate that CSV output exists and is readable + + WHAT IT CHECKS: + - run_table.csv exists + - CSV is readable (valid format) + - At least one row of data + + RETURNS: + True if valid, raises AssertionError if not + + WHY: CSV is the main output format, so this is critical + """ + csv_file = experiment_dir / "run_table.csv" + + # Check file exists + assert csv_file.exists(), f"run_table.csv not found in {experiment_dir}" + + # Check file is not empty + content = csv_file.read_text() + assert len(content) > 0, "run_table.csv is empty" + + # Check it has at least a header row + lines = content.strip().split('\n') + assert len(lines) >= 1, "run_table.csv has no header" + + return True + + + def validate_experiment_structure(self, experiment_dir: Path) -> bool: + """ + Validate expected directory structure exists + + EXPECTED STRUCTURE: + experiment_dir/ + ├── run_table.csv (main results) + ├── metadata.json (experiment metadata) + └── run_0_repetition_0/ (per-run data) + ├── profiler_output + └── raw_data + + RETURNS: + True if structure is valid + """ + # Check required files + required_files = [ + "run_table.csv", + "metadata.json" + ] + + for filename in required_files: + filepath = experiment_dir / filename + assert filepath.exists(), \ + f"Missing required file: {filename} in {experiment_dir}" + + # Check at least one run directory exists + run_dirs = list(experiment_dir.glob("run_*")) + assert len(run_dirs) > 0, \ + f"No run directories found in {experiment_dir}" + + return True + + + def validate_no_errors_in_output(self, result: ExperimentResult) -> bool: + """ + Check that stderr doesn't contain error keywords + + WHAT IT CHECKS: + - stderr is empty OR doesn't contain [FAIL], "Error", "Exception" + + WHY: The experiment might complete but still have warnings/errors + + RETURNS: + True if no critical errors detected + """ + error_keywords = ["[FAIL]", "[ERROR]", "Exception", "Traceback"] + + for keyword in error_keywords: + assert keyword not in result.stderr, \ + f"Found error keyword '{keyword}' in stderr:\n{result.stderr}" + + return True + + + # ======================================================================== + # SIMULATION METHODS (for testing failure cases) + # ======================================================================== + + def simulate_run_crash( + self, + experiment_dir: Path, + run_id: int + ) -> None: + """ + Simulate a crash mid-experiment by modifying run_table.csv + + This marks a run as incomplete so when we re-run, the framework + will think it crashed and try to restart it. + + USAGE: + # Run experiment partially + result1 = self.run_experiment(config, temp_dir) + + # Simulate crash on run 1 + self.simulate_run_crash(temp_dir, run_id=1) + + # Re-run and verify it handles the restart correctly + result2 = self.run_experiment(config, temp_dir) + assert result2.success + + WHAT IT DOES: + - Reads run_table.csv + - Finds the row for the specified run + - Sets __done to "TODO" (marks as incomplete) + - Writes it back + + WHY: Tests that restart/recovery logic works correctly + """ + csv_file = experiment_dir / "run_table.csv" + + # Read CSV content + content = csv_file.read_text() + lines = content.strip().split('\n') + + if len(lines) < 2: + raise ValueError("CSV has no data rows to modify") + + # Find and modify the row for this run_id + header = lines[0] + rows = lines[1:] + + modified_rows = [] + for row_idx, row in enumerate(rows): + if row_idx == run_id: + # Set __done to TODO (incomplete) + # This assumes __done is the first column + cols = row.split(',') + cols[0] = 'TODO' + modified_rows.append(','.join(cols)) + else: + modified_rows.append(row) + + # Write back to CSV + new_content = header + '\n' + '\n'.join(modified_rows) + csv_file.write_text(new_content) + + + # ======================================================================== + # HELPER METHODS + # ======================================================================== + + def get_experiment_dir( + self, + results_dir: Path, + experiment_name: str + ) -> Path: + """ + Get the full path to an experiment's results directory + + STRUCTURE: + results_dir/ + └── / <- returned path + └── run_table.csv + + PARAMETERS: + results_dir: Parent results directory + experiment_name: Name of experiment (from RunnerConfig.name) + + RETURNS: + Path to experiment directory + """ + return results_dir / experiment_name + + + def read_csv_as_dicts(self, csv_path: Path) -> List[Dict]: + """ + Read CSV file and return as list of dictionaries + + WHY: Easier to work with dictionaries than raw CSV strings + Can access columns by name: row['avg_cpu'] + + EXAMPLE: + rows = self.read_csv_as_dicts(Path("run_table.csv")) + for row in rows: + print(row['run_id'], row['avg_cpu']) + """ + import csv + + with open(csv_path, 'r') as f: + reader = csv.DictReader(f) + return list(reader) diff --git a/test/system/fixtures/__init__.py b/test/system/fixtures/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/test/system/test_basic_run.py b/test/system/test_basic_run.py new file mode 100644 index 000000000..db1607e7c --- /dev/null +++ b/test/system/test_basic_run.py @@ -0,0 +1,292 @@ +""" +EXAMPLE: Basic System Tests + +This file demonstrates how to write system-level tests using the new framework. + +KEY CONCEPTS: +1. Tests inherit from SystemExperimentTest (base class with helper methods) +2. Tests use fixtures from conftest.py (temp_dir, env_vars_clean) +3. Tests run REAL experiments, not mocked versions +4. Each test is independent (isolated temp directories) + +RUN THESE TESTS: + pytest test/system/test_basic_run.py # Run all tests in this file + pytest test/system/test_basic_run.py::TestBasicRuns::test_hello_world + pytest test/system/test_basic_run.py -v # Verbose output + pytest test/system/test_basic_run.py -s # Show print statements +""" + +import pytest +from pathlib import Path +from test.system.base_system_test import SystemExperimentTest + + +class TestBasicRuns(SystemExperimentTest): + """ + Test suite: Basic experiment execution + + Each method is a test. Pytest runs them and reports: + - PASSED: test completed successfully + - FAILED: assertion failed + - ERROR: exception raised + """ + + @pytest.mark.system + def test_hello_world_experiment_runs(self, temp_dir): + """ + TEST 1: Can we run the hello-world example? + + SETUP: + - temp_dir: pytest fixture (see conftest.py) provides fresh directory + + WHAT IT DOES: + 1. Run the hello-world experiment + 2. Check that it completes successfully + 3. Verify output directory exists + + HOW PYTEST WORKS: + - Calls fixture: temp_dir is created + - Runs test function + - If any 'assert' fails, test FAILS + - Cleanup: temp_dir is deleted + + EXAMPLE OUTPUT: + PASSED test_hello_world_experiment_runs + + If it fails: + FAILED test_hello_world_experiment_runs + AssertionError: assert False == True + ...stderr output... + """ + # Step 1: Run the actual experiment + result = self.run_experiment( + config_path="examples/hello-world", + results_dir=temp_dir + ) + + # Step 2: Assert it was successful + # If this fails, test fails with clear error + assert result.success, f"Experiment failed!\nStderr: {result.stderr}" + + # Step 3: Verify no errors in output + self.validate_no_errors_in_output(result) + + # Step 4: Verify results directory exists + assert (temp_dir / "experiments").exists(), \ + "Results directory was not created" + + + @pytest.mark.system + def test_hello_world_output_structure(self, temp_dir): + """ + TEST 2: Does hello-world create the expected output structure? + + WHAT IT CHECKS: + - run_table.csv exists and has content + - Directory structure is correct + - All required files are present + """ + # Run experiment + result = self.run_experiment( + config_path="examples/hello-world", + results_dir=temp_dir + ) + + # Get the experiment directory + # (assumes experiment is named "new_runner_experiment" by default) + exp_dir = temp_dir / "experiments" / "new_runner_experiment" + + # Validate structure + self.validate_experiment_structure(exp_dir) + self.validate_csv_output(exp_dir) + + + @pytest.mark.system + def test_fibonacci_experiment_runs(self, temp_dir): + """ + TEST 3: Test a different example (fibonacci) + + WHY: Tests should be specific, not generic + Each example might have different requirements + """ + result = self.run_experiment( + config_path="examples/hello-world-fibonacci", + results_dir=temp_dir + ) + + assert result.success, f"Fibonacci experiment failed:\n{result.stderr}" + self.validate_no_errors_in_output(result) + + + @pytest.mark.system + @pytest.mark.slow + def test_multiple_sequential_runs(self, temp_dir): + """ + TEST 4: Can we run multiple experiments in sequence? + + @pytest.mark.slow decorator means: + - pytest -m slow (run ONLY slow tests) + - pytest --skip-slow (skip slow tests) + + WHY: Some tests are slow. During development, you might skip them. + Use for comprehensive testing before submitting. + """ + # Run first experiment + result1 = self.run_experiment( + config_path="examples/hello-world", + results_dir=temp_dir + ) + assert result1.success, f"First run failed: {result1.stderr}" + + # Run second experiment (different name to avoid conflicts) + # This tests that framework can handle multiple experiments + result2 = self.run_experiment( + config_path="examples/hello-world-fibonacci", + results_dir=temp_dir + ) + assert result2.success, f"Second run failed: {result2.stderr}" + + +class TestRestartRecovery(SystemExperimentTest): + """ + Test suite: Experiment restart/recovery on crash + + These tests verify that if an experiment crashes mid-way, + we can resume it and it completes correctly. + """ + + @pytest.mark.system + @pytest.mark.slow + def test_restart_after_simulated_crash(self, temp_dir): + """ + TEST 5: Can framework recover from a crash? + + SCENARIO: + 1. Run experiment partially + 2. Simulate a crash (mark a run as incomplete) + 3. Re-run and verify it continues from where it left off + + WHY: Real-world experiments can crash. Framework should handle this gracefully. + """ + # Step 1: Run initial experiment + result1 = self.run_experiment( + config_path="test-standalone/core/shuffling", + results_dir=temp_dir + ) + assert result1.success, f"Initial run failed: {result1.stderr}" + + # Step 2: Simulate crash by marking run 1 as incomplete + exp_dir = temp_dir / "experiments" / "new_runner_experiment" + self.simulate_run_crash(exp_dir, run_id=1) + + # Verify the crash was simulated + csv_rows = self.read_csv_as_dicts(exp_dir / "run_table.csv") + assert csv_rows[1]['__done'] == 'TODO', \ + "Crash simulation didn't mark run as incomplete" + + # Step 3: Re-run experiment (should continue from run 1) + result2 = self.run_experiment( + config_path="test-standalone/core/shuffling", + results_dir=temp_dir + ) + assert result2.success, f"Recovery run failed: {result2.stderr}" + + # Step 4: Verify all runs are now complete + csv_rows = self.read_csv_as_dicts(exp_dir / "run_table.csv") + for row in csv_rows: + assert row['__done'] == 'DONE', \ + f"Run not completed: {row}" + + +# ============================================================================ +# DEMONSTRATION: How fixtures work +# ============================================================================ + +class TestFixtureDemonstration: + """ + This class shows HOW FIXTURES WORK in pytest + + Fixtures are like setUp() but more powerful. + They can: + - Provide test data + - Create temporary resources + - Handle cleanup automatically + """ + + def test_temp_dir_fixture(self, temp_dir): + """ + This test receives 'temp_dir' fixture automatically. + + Pytest: + 1. Creates temp directory + 2. Passes it to this function as 'temp_dir' parameter + 3. Runs this test + 4. Cleans up temp directory + 5. Test done! + """ + # temp_dir is a Path object pointing to fresh directory + assert temp_dir.is_dir() + assert len(list(temp_dir.iterdir())) == 0 # Empty + + # Create a file + test_file = temp_dir / "test.txt" + test_file.write_text("Hello!") + + # Verify it exists + assert test_file.exists() + + # After this test ends, temp_dir is automatically deleted + + + def test_experiment_output_dir_fixture(self, experiment_output_dir): + """ + This test receives 'experiment_output_dir' fixture. + + This fixture creates the directory structure that + Experiment Runner expects: + experiments/ + └── my_experiment/ + """ + # The fixture creates experiments/ directory + assert experiment_output_dir.exists() + assert experiment_output_dir.parent.name == "experiments" + + +# ============================================================================ +# ADVANCED: Parameterized tests +# ============================================================================ + +class TestParameterized(SystemExperimentTest): + """ + Parameterized tests run the same test with different inputs. + + WHY: Avoid writing the same test multiple times with different configs. + One test function runs multiple times with different parameters. + """ + + @pytest.mark.parametrize("example_name", [ + "hello-world", + "hello-world-fibonacci", + ]) + @pytest.mark.system + def test_all_examples_run(self, example_name, temp_dir): + """ + This test runs TWICE: + - Once with example_name="hello-world" + - Once with example_name="hello-world-fibonacci" + + PYTEST PARAMETRIZE SYNTAX: + @pytest.mark.parametrize("param_name", [list of values]) + def test_something(param_name, other_fixtures): + ... + + BENEFIT: + - DRY (Don't Repeat Yourself) + - Easier to add new test cases + - Clear pass/fail for each variant + """ + result = self.run_experiment( + config_path=f"examples/{example_name}", + results_dir=temp_dir + ) + assert result.success, f"{example_name} failed: {result.stderr}" diff --git a/test/system/validators/__init__.py b/test/system/validators/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/test/unit/__init__.py b/test/unit/__init__.py new file mode 100644 index 000000000..e69de29bb