Skip to content

Support: report wall-clock orch and sched time in benchmark_rounds#324

Closed
hw-native-sys-bot wants to merge 1 commit intohw-native-sys:mainfrom
hw-native-sys-bot:benchmark-wall-clock-orch-sched
Closed

Support: report wall-clock orch and sched time in benchmark_rounds#324
hw-native-sys-bot wants to merge 1 commit intohw-native-sys:mainfrom
hw-native-sys-bot:benchmark-wall-clock-orch-sched

Conversation

@hw-native-sys-bot
Copy link
Collaborator

@hw-native-sys-bot hw-native-sys-bot commented Mar 18, 2026

Summary

  • Parse BENCHMARK: thread=N end=<cycles> and Scheduler summary: total_time=Xus from device logs to compute per-round wall-clock orchestration and scheduling time
  • For concurrent threads, reports max(end) - min(start) instead of summing per-thread durations, so overlapping work is not double-counted
  • Gracefully degrades to elapsed-only output when profiling data is unavailable

Test plan

  • Run ./tools/benchmark_rounds.sh on hardware with PTO2_PROFILING enabled, verify Orch/Sched columns appear
  • Run with PTO2_PROFILING disabled, verify output degrades to elapsed-only (no Orch/Sched columns)
  • Verify multi-round averages and trimmed averages are computed correctly

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the benchmark_rounds.sh script by integrating more granular performance metrics. It introduces the capability to extract and display wall-clock orchestration and scheduling times directly from device logs, offering a deeper insight into the timing of benchmark rounds beyond just total elapsed time. The changes ensure accurate measurement for concurrent operations and maintain usability by providing a fallback to basic elapsed time reporting when detailed profiling is absent.

Highlights

  • Enhanced Metric Reporting: The benchmark_rounds.sh script now parses additional log data to report wall-clock orchestration and scheduling times per benchmark round.
  • Accurate Concurrent Timing: For concurrent threads, orchestration and scheduling times are calculated as max(end) - min(start) to prevent double-counting overlapping work, providing a more accurate wall-clock measurement.
  • Graceful Degradation: The script gracefully falls back to displaying only elapsed time if the detailed profiling data (orchestration and scheduling) is not available in the device logs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Parse additional device log markers (BENCHMARK end cycle, Scheduler
summary total_time) to compute per-round wall-clock orchestration and
scheduling time. For concurrent threads, reports max(end)-min(start)
instead of summing per-thread durations, so overlapping work is not
double-counted.
@hw-native-sys-bot hw-native-sys-bot force-pushed the benchmark-wall-clock-orch-sched branch from d51bcfc to bc10aa1 Compare March 18, 2026 06:15
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the benchmark_rounds.sh script to report wall-clock orchestration and scheduling times by parsing additional profiling information from device logs. The changes correctly handle cases where this detailed information is not available. My review identified a significant issue in how statistics are calculated for the new metrics when data is sparse across rounds, which could lead to incorrect benchmark results. I have provided a detailed comment with a suggested fix to ensure the accuracy of the reported metrics.

Comment on lines +207 to 244
sum_e = 0; min_e = r_elapsed[0]; max_e = r_elapsed[0]
sum_o = 0; min_o = r_orch[0]; max_o = r_orch[0]
sum_s = 0; min_s = r_sched[0]; max_s = r_sched[0]
for (i = 0; i < count; i++) {
printf " %-8d %12.1f\n", i, results[i]
sum_v += results[i]
if (results[i] < min_v) min_v = results[i]
if (results[i] > max_v) max_v = results[i]
if (has_detail)
printf " %-8d %12.1f %12.1f %12.1f\n", i, r_elapsed[i], r_orch[i], r_sched[i]
else
printf " %-8d %12.1f\n", i, r_elapsed[i]
sum_e += r_elapsed[i]
if (r_elapsed[i] < min_e) min_e = r_elapsed[i]
if (r_elapsed[i] > max_e) max_e = r_elapsed[i]
sum_o += r_orch[i]
if (r_orch[i] < min_o) min_o = r_orch[i]
if (r_orch[i] > max_o) max_o = r_orch[i]
sum_s += r_sched[i]
if (r_sched[i] < min_s) min_s = r_sched[i]
if (r_sched[i] > max_s) max_s = r_sched[i]
}
printf "\n Avg: %.1f us (%d rounds)\n", sum_v / count, count
printf "\n Avg: %.1f us (%d rounds)\n", sum_e / count, count
if (count > 2) {
trimmed = (sum_v - min_v - max_v) / (count - 2)
printf " Trimmed Avg: %.1f us (excluding min=%.1f, max=%.1f)\n", trimmed, min_v, max_v
trimmed = (sum_e - min_e - max_e) / (count - 2)
printf " Trimmed Avg: %.1f us (excluding min=%.1f, max=%.1f)\n", trimmed, min_e, max_e
}
if (has_detail) {
printf "\n Orch Avg: %.1f us", sum_o / count
if (count > 2) {
trimmed_o = (sum_o - min_o - max_o) / (count - 2)
printf " Trimmed: %.1f us", trimmed_o
}
printf " Total: %.1f us\n", sum_o

printf " Sched Avg: %.1f us", sum_s / count
if (count > 2) {
trimmed_s = (sum_s - min_s - max_s) / (count - 2)
printf " Trimmed: %.1f us", trimmed_s
}
printf " Total: %.1f us\n", sum_s
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculation of statistics (average, min, max, trimmed average) for orchestration and scheduling times is incorrect when some rounds are missing this data. The current implementation initializes min_o and min_s with r_orch[0] and r_sched[0], which can be 0 if data for the first round is missing. This leads to an incorrect minimum value of 0. Additionally, the average is calculated by dividing by the total number of rounds (count) instead of the number of rounds that actually have orchestration or scheduling data, skewing the average. The trimmed average is also affected by these issues.

The logic should be updated to only consider rounds with valid data (> 0) for these statistics. This involves:

  1. Initializing min_o and min_s to a value indicating they are not set yet.
  2. Keeping separate counters (count_o, count_s) for rounds with orchestration and scheduling data.
  3. Updating sums, min, max, and counts only for rounds with data.
  4. Using these separate counters for calculating averages and trimmed averages.
        sum_e = 0; min_e = r_elapsed[0]; max_e = r_elapsed[0]
        sum_o = 0; min_o = -1; max_o = 0; count_o = 0
        sum_s = 0; min_s = -1; max_s = 0; count_s = 0
        for (i = 0; i < count; i++) {
            if (has_detail)
                printf "  %-8d  %12.1f  %12.1f  %12.1f\n", i, r_elapsed[i], r_orch[i], r_sched[i]
            else
                printf "  %-8d  %12.1f\n", i, r_elapsed[i]

            sum_e += r_elapsed[i]
            if (r_elapsed[i] < min_e) min_e = r_elapsed[i]
            if (r_elapsed[i] > max_e) max_e = r_elapsed[i]

            if (r_orch[i] > 0) {
                sum_o += r_orch[i]
                if (min_o < 0 || r_orch[i] < min_o) min_o = r_orch[i]
                if (r_orch[i] > max_o) max_o = r_orch[i]
                count_o++
            }
            if (r_sched[i] > 0) {
                sum_s += r_sched[i]
                if (min_s < 0 || r_sched[i] < min_s) min_s = r_sched[i]
                if (r_sched[i] > max_s) max_s = r_sched[i]
                count_s++
            }
        }
        printf "\n  Avg: %.1f us  (%d rounds)\n", sum_e / count, count
        if (count > 2) {
            trimmed = (sum_e - min_e - max_e) / (count - 2)
            printf "  Trimmed Avg: %.1f us  (excluding min=%.1f, max=%.1f)\n", trimmed, min_e, max_e
        }
        if (has_detail) {
            if (count_o > 0) {
                printf "\n  Orch  Avg: %.1f us", sum_o / count_o
                if (count_o > 2) {
                    trimmed_o = (sum_o - min_o - max_o) / (count_o - 2)
                    printf "  Trimmed: %.1f us", trimmed_o
                }
                printf "  Total: %.1f us\n", sum_o
            }
            if (count_s > 0) {
                printf "  Sched Avg: %.1f us", sum_s / count_s
                if (count_s > 2) {
                    trimmed_s = (sum_s - min_s - max_s) / (count_s - 2)
                    printf "  Trimmed: %.1f us", trimmed_s
                }
                printf "  Total: %.1f us\n", sum_s
            }
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants