Support: report wall-clock orch and sched time in benchmark_rounds by hw-native-sys-bot · Pull Request #324 · hw-native-sys/simpler

hw-native-sys-bot · 2026-03-18T06:12:56Z

Summary

Parse BENCHMARK: thread=N end=<cycles> and Scheduler summary: total_time=Xus from device logs to compute per-round wall-clock orchestration and scheduling time
For concurrent threads, reports max(end) - min(start) instead of summing per-thread durations, so overlapping work is not double-counted
Gracefully degrades to elapsed-only output when profiling data is unavailable

Test plan

Run ./tools/benchmark_rounds.sh on hardware with PTO2_PROFILING enabled, verify Orch/Sched columns appear
Run with PTO2_PROFILING disabled, verify output degrades to elapsed-only (no Orch/Sched columns)
Verify multi-round averages and trimmed averages are computed correctly

gemini-code-assist · 2026-03-18T06:13:15Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the benchmark_rounds.sh script by integrating more granular performance metrics. It introduces the capability to extract and display wall-clock orchestration and scheduling times directly from device logs, offering a deeper insight into the timing of benchmark rounds beyond just total elapsed time. The changes ensure accurate measurement for concurrent operations and maintain usability by providing a fallback to basic elapsed time reporting when detailed profiling is absent.

Highlights

Enhanced Metric Reporting: The benchmark_rounds.sh script now parses additional log data to report wall-clock orchestration and scheduling times per benchmark round.
Accurate Concurrent Timing: For concurrent threads, orchestration and scheduling times are calculated as max(end) - min(start) to prevent double-counting overlapping work, providing a more accurate wall-clock measurement.
Graceful Degradation: The script gracefully falls back to displaying only elapsed time if the detailed profiling data (orchestration and scheduling) is not available in the device logs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Parse additional device log markers (BENCHMARK end cycle, Scheduler summary total_time) to compute per-round wall-clock orchestration and scheduling time. For concurrent threads, reports max(end)-min(start) instead of summing per-thread durations, so overlapping work is not double-counted.

gemini-code-assist

Code Review

This pull request extends the benchmark_rounds.sh script to report wall-clock orchestration and scheduling times by parsing additional profiling information from device logs. The changes correctly handle cases where this detailed information is not available. My review identified a significant issue in how statistics are calculated for the new metrics when data is sparse across rounds, which could lead to incorrect benchmark results. I have provided a detailed comment with a suggested fix to ensure the accuracy of the reported metrics.

gemini-code-assist · 2026-03-18T06:15:41Z

tools/benchmark_rounds.sh

+        sum_e = 0; min_e = r_elapsed[0]; max_e = r_elapsed[0]
+        sum_o = 0; min_o = r_orch[0]; max_o = r_orch[0]
+        sum_s = 0; min_s = r_sched[0]; max_s = r_sched[0]
        for (i = 0; i < count; i++) {
-            printf "  %-8d  %12.1f\n", i, results[i]
-            sum_v += results[i]
-            if (results[i] < min_v) min_v = results[i]
-            if (results[i] > max_v) max_v = results[i]
+            if (has_detail)
+                printf "  %-8d  %12.1f  %12.1f  %12.1f\n", i, r_elapsed[i], r_orch[i], r_sched[i]
+            else
+                printf "  %-8d  %12.1f\n", i, r_elapsed[i]
+            sum_e += r_elapsed[i]
+            if (r_elapsed[i] < min_e) min_e = r_elapsed[i]
+            if (r_elapsed[i] > max_e) max_e = r_elapsed[i]
+            sum_o += r_orch[i]
+            if (r_orch[i] < min_o) min_o = r_orch[i]
+            if (r_orch[i] > max_o) max_o = r_orch[i]
+            sum_s += r_sched[i]
+            if (r_sched[i] < min_s) min_s = r_sched[i]
+            if (r_sched[i] > max_s) max_s = r_sched[i]
        }
-        printf "\n  Avg: %.1f us  (%d rounds)\n", sum_v / count, count
+        printf "\n  Avg: %.1f us  (%d rounds)\n", sum_e / count, count
        if (count > 2) {
-            trimmed = (sum_v - min_v - max_v) / (count - 2)
-            printf "  Trimmed Avg: %.1f us  (excluding min=%.1f, max=%.1f)\n", trimmed, min_v, max_v
+            trimmed = (sum_e - min_e - max_e) / (count - 2)
+            printf "  Trimmed Avg: %.1f us  (excluding min=%.1f, max=%.1f)\n", trimmed, min_e, max_e
+        }
+        if (has_detail) {
+            printf "\n  Orch  Avg: %.1f us", sum_o / count
+            if (count > 2) {
+                trimmed_o = (sum_o - min_o - max_o) / (count - 2)
+                printf "  Trimmed: %.1f us", trimmed_o
+            }
+            printf "  Total: %.1f us\n", sum_o
+
+            printf "  Sched Avg: %.1f us", sum_s / count
+            if (count > 2) {
+                trimmed_s = (sum_s - min_s - max_s) / (count - 2)
+                printf "  Trimmed: %.1f us", trimmed_s
+            }
+            printf "  Total: %.1f us\n", sum_s
        }


The calculation of statistics (average, min, max, trimmed average) for orchestration and scheduling times is incorrect when some rounds are missing this data. The current implementation initializes min_o and min_s with r_orch[0] and r_sched[0], which can be 0 if data for the first round is missing. This leads to an incorrect minimum value of 0. Additionally, the average is calculated by dividing by the total number of rounds (count) instead of the number of rounds that actually have orchestration or scheduling data, skewing the average. The trimmed average is also affected by these issues.

The logic should be updated to only consider rounds with valid data (> 0) for these statistics. This involves:

Initializing min_o and min_s to a value indicating they are not set yet.

Keeping separate counters (count_o, count_s) for rounds with orchestration and scheduling data.

Updating sums, min, max, and counts only for rounds with data.

Using these separate counters for calculating averages and trimmed averages.

sum_e = 0; min_e = r_elapsed[0]; max_e = r_elapsed[0] sum_o = 0; min_o = -1; max_o = 0; count_o = 0 sum_s = 0; min_s = -1; max_s = 0; count_s = 0 for (i = 0; i < count; i++) { if (has_detail) printf " %-8d %12.1f %12.1f %12.1f\n", i, r_elapsed[i], r_orch[i], r_sched[i] else printf " %-8d %12.1f\n", i, r_elapsed[i] sum_e += r_elapsed[i] if (r_elapsed[i] < min_e) min_e = r_elapsed[i] if (r_elapsed[i] > max_e) max_e = r_elapsed[i] if (r_orch[i] > 0) { sum_o += r_orch[i] if (min_o < 0 || r_orch[i] < min_o) min_o = r_orch[i] if (r_orch[i] > max_o) max_o = r_orch[i] count_o++ } if (r_sched[i] > 0) { sum_s += r_sched[i] if (min_s < 0 || r_sched[i] < min_s) min_s = r_sched[i] if (r_sched[i] > max_s) max_s = r_sched[i] count_s++ } } printf "\n Avg: %.1f us (%d rounds)\n", sum_e / count, count if (count > 2) { trimmed = (sum_e - min_e - max_e) / (count - 2) printf " Trimmed Avg: %.1f us (excluding min=%.1f, max=%.1f)\n", trimmed, min_e, max_e } if (has_detail) { if (count_o > 0) { printf "\n Orch Avg: %.1f us", sum_o / count_o if (count_o > 2) { trimmed_o = (sum_o - min_o - max_o) / (count_o - 2) printf " Trimmed: %.1f us", trimmed_o } printf " Total: %.1f us\n", sum_o } if (count_s > 0) { printf " Sched Avg: %.1f us", sum_s / count_s if (count_s > 2) { trimmed_s = (sum_s - min_s - max_s) / (count_s - 2) printf " Trimmed: %.1f us", trimmed_s } printf " Total: %.1f us\n", sum_s } }

hw-native-sys-bot force-pushed the benchmark-wall-clock-orch-sched branch from d51bcfc to bc10aa1 Compare March 18, 2026 06:15

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

hw-native-sys-bot closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support: report wall-clock orch and sched time in benchmark_rounds#324

Support: report wall-clock orch and sched time in benchmark_rounds#324
hw-native-sys-bot wants to merge 1 commit intohw-native-sys:mainfrom
hw-native-sys-bot:benchmark-wall-clock-orch-sched

hw-native-sys-bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hw-native-sys-bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hw-native-sys-bot commented Mar 18, 2026 •

edited

Loading