Skip to content

kmh smt support#792

Open
tastynoob wants to merge 6 commits intoxs-devfrom
kmh-smt
Open

kmh smt support#792
tastynoob wants to merge 6 commits intoxs-devfrom
kmh-smt

Conversation

@tastynoob
Copy link
Copy Markdown
Collaborator

@tastynoob tastynoob commented Mar 16, 2026

Summary by CodeRabbit

  • New Features

    • SMT-aware decode and issue scheduling with configurable policies and per-thread instruction counters
    • New SMT-based selector set as the default to prioritize threads under SMT
    • Exposed scheduler hooks for SMT initialization and per-thread thread selection
    • Broader SMT/multi-context support across CPU, difftest, and branch prediction subsystems
  • Bug Fixes

    • Thread-aware squashing, commit, wakeup, and queue handling to prevent cross-thread interference
    • Per-thread tracking for squash/version and queue emptiness/fullness checks

@tastynoob tastynoob requested a review from jensen-yan March 16, 2026 02:39
@tastynoob tastynoob added the perf label Mar 16, 2026
@github-actions
Copy link
Copy Markdown

🚀 Performance test triggered: gcc12-spec06-0.8c

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds SMT-aware scheduling and per-thread state across O3: new SMTScheduler types and SMTBasedSelector, per-thread InstsCounter usage, SquashInfo and per-thread squash/version tracking, LSQ/IEW/IssueQue/Scheduler API changes, and SMT decode selection integrated into Fetch.

Changes

Cohort / File(s) Summary
SMT scheduler & bindings
src/cpu/o3/smt_sched.hh, src/cpu/o3/FuncScheduler.py, src/cpu/o3/SConscript
Add SMTScheduler base, IndependentIQICountScheduler, per-thread inc/dec on InstsCounter; add Python binding SMTBasedSelector and register it in SConscript; default IssueQue selector set to SMTBasedSelector().
Fetch stage
src/cpu/o3/fetch.hh, src/cpu/o3/fetch.cc
Introduce decodeScheduler and per-thread InstsCounter members; add initDecodeScheduler() and selectUnstalledThread(); use SMT-aware thread selection for decode; convert localSquashVer to per-thread array and propagate per-thread ftq state.
Issue Queue & Scheduler
src/cpu/o3/issue_queue.hh, src/cpu/o3/issue_queue.cc
Implement SMTBasedSelector, per-IQ InstsCounter and IndependentIQICountScheduler; update Scheduler API for thread-aware doCommit/doSquash; add per-thread IQ counters and thread-scoped wakeup/cancel logic.
InstructionQueue
src/cpu/o3/inst_queue.hh, src/cpu/o3/inst_queue.cc
Add private SquashInfo member; initialize IQ SMT hooks; pass SquashInfo (tid+seq) to scheduler doSquash; restrict non-spec squashing/removal to the squashed thread.
Comm / IEW counters
src/cpu/o3/comm.hh, src/cpu/o3/iew.cc
Add SquashInfo struct and per-thread counters (iqCount, ldstqCount, robCount); update IEW to populate per-thread counts and use per-thread localSquashVer.
Per-thread squash/version propagation
src/cpu/o3/decode.hh, src/cpu/o3/decode.cc, src/cpu/o3/rename.hh, src/cpu/o3/rename.cc, src/cpu/o3/commit.hh, src/cpu/o3/iew.hh
Convert single localSquashVer into localSquashVer[MaxThreads] across pipeline stages; update checks and updates to index by thread.
LSQ per-thread APIs
src/cpu/o3/lsq.hh, src/cpu/o3/lsq.cc
Add per-thread LSQ queries (lqEmpty(tid), sqEmpty(tid) and declare lqFull/sqFull(tid)) for scheduler/fetch use.
Branch predictors & FTQ scheduling
src/cpu/pred/btb/... (many files)
Make predictors thread-aware: per-thread history/meta, add ThreadID parameters to many APIs (getPredictionMeta, doUpdateHist, lookupHelper, generateSinglePrediction, etc.), and introduce predict-scheduling (scheduleThread/nextPredictTid) logic.
CPU/difftest & system multi-context
src/cpu/base.*, src/cpu/difftest.*, src/sim/system.*, configs/common/*, src/cpu/o3/cpu.cc
Introduce per-thread diffAllStates, per-thread difftest hart id helper, System::multiContextDifftest(), option placement updates and Full System thread initialization when SMT enabled; update readGem5Regs to accept ThreadID.
Misc & defaults
.gitignore, various small files (pred/ras/ubtb stubs, simple/base.*, etc.)
Add ignore patterns; small signature changes to accept ThreadID (suppress unused tid where appropriate); update many predictor and utility signatures to thread-aware variants.

Sequence Diagram

sequenceDiagram
    participant Fetch
    participant IEW
    participant DecodeSched as SMTScheduler
    participant IQ as IssueQue
    participant Scheduler

    rect rgba(200,230,255,0.5)
    Fetch->>IEW: request per-thread IewComm counters (iqCount, ldstqCount, robCount)
    IEW-->>Fetch: return counters (per-thread)
    Fetch->>DecodeSched: update counters / InstsCounter(tid)
    DecodeSched->>DecodeSched: getThread() -> chosen tid
    DecodeSched-->>Fetch: selected ThreadID
    Fetch->>Fetch: sendInstructionsToDecode(selected tid)
    end

    rect rgba(230,255,200,0.5)
    Scheduler->>IQ: doSquash(SquashInfo{tid, seqNum})
    IQ->>IQ: remove matching insts (filter by tid+seq)
    IQ->>IQ: decInIQInstsCounter(tid) for removed insts
    Scheduler->>Scheduler: initIQICountSmtScheduler(numThreads)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • jensen-yan
  • Yakkhini

Poem

🐰
I nudged the counters, soft and fleet,
Threads now argue who gets to eat,
SquashInfo whispers which tail to prune,
DecodeScheduler hums a careful tune,
Hopping bytes — SMT skies are sweet.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'kmh smt support' is vague and generic, using non-descriptive terminology that doesn't clearly convey what SMT support entails or which components are affected. Consider revising to a more descriptive title such as 'Add SMT scheduling support to fetch and issue queue stages' to better communicate the scope and nature of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmh-smt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2665 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
src/cpu/o3/fetch.cc (1)

377-410: Prefer RAII for decode-scheduler ownership.

initDecodeScheduler() allocates decodeScheduler and counters with raw new. Converting these members to std::unique_ptr avoids manual lifetime management and teardown leaks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 377 - 410, Make decodeScheduler,
lsqCounter, iqCounter, and robCounter owned by std::unique_ptrs instead of raw
pointers and update initDecodeScheduler() to allocate them with
std::make_unique; replace bare new InstsCounter() and new ...Scheduler(...) with
make_unique calls and when constructing schedulers that require raw pointers
pass counter.get() (or change those ctor signatures to accept pointer/reference
smartly). Ensure class members are changed to std::unique_ptr<InstsCounter> and
std::unique_ptr<...SchedulerType> and remove any manual delete/cleanup paths so
RAII manages lifetime; keep references to the symbols initDecodeScheduler,
lsqCounter, iqCounter, robCounter, decodeScheduler, InstsCounter,
ICountScheduler, DelayedICountScheduler, and MultiPrioritySched to locate the
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1325-1349: The selector Fetch::selectUnstalledThread currently
only checks stallSig->blockFetch but may return a thread whose fetchQueue is
empty; update Fetch::selectUnstalledThread to exclude threads with empty
fetchQueue (e.g., fetchQueue[tid].empty() or equivalent API) when computing
candidates and when calling decodeScheduler->getThread so decode handoff never
selects an unstalled-but-empty thread; ensure the
lsqCounter/iqCounter/robCounter updates remain correct for excluded threads
(still set to UINT64_MAX) and update any logic that relies on selected thread to
loop to another ready thread if decodeScheduler returns an empty-thread id.

In `@src/cpu/o3/fetch.hh`:
- Around line 237-247: The member delayedSchedulerDelay is used uninitialized
when passed into DelayedICountScheduler in initDecodeScheduler(); initialize it
to a safe default either by adding an in-class initializer (e.g., int
delayedSchedulerDelay = 0;) or by setting it in the class constructor for the
fetch unit so delayedSchedulerDelay has a well-defined value before
initDecodeScheduler() runs; update the declaration in fetch.hh or the fetch
class constructor accordingly and ensure any configuration parsing assigns it
prior to creating DelayedICountScheduler.

In `@src/cpu/o3/FuncScheduler.py`:
- Line 93: Update the parameter description for the selector parameter so it
reflects the actual default: change the help text passed to Param.BaseSelector
for the variable sel (which is currently constructed with SMTBasedSelector()) to
mention "default: SMT-based" (or similar accurate wording) instead of "default:
age first"; ensure the description in Param.BaseSelector(SMTBasedSelector(),
"Selector for this IQ (...)") references SMTBasedSelector as the default.

In `@src/cpu/o3/iew.cc`:
- Around line 1533-1538: The code only updates IEW-to-Fetch counters for a
single thread by using ThreadID tid = *activeThreads->begin(); instead of
iterating all active threads, causing stale values in toFetch->iewInfo[]. Fix by
iterating over every ThreadID in activeThreads and for each thread update
toFetch->iewInfo[tid].resolvedCFIs.clear() and set
toFetch->iewInfo[tid].ldstqCount = ldstQueue.getCount(tid),
toFetch->iewInfo[tid].robCount = rob->getThreadEntries(tid), and
toFetch->iewInfo[tid].iqCount = scheduler->getIQInsts(tid) so all active threads
publish fresh counters.

In `@src/cpu/o3/issue_queue.cc`:
- Around line 883-890: The DPRINTF call in
IssueQue::initIndependentIQICountScheduler uses "%s" but passes iqname (a
std::string), causing undefined behavior; change the log to pass iqname.c_str()
so the format string receives a const char* (update the DPRINTF invocation that
currently prints "[%s] IndependentIQICountScheduler created.\n" to use
iqname.c_str()).

In `@src/cpu/o3/smt_sched.hh`:
- Around line 31-32: decCounter can underflow the unsigned counter[tid]; change
decCounter (and related uses if any) to safely subtract by checking if
counter[tid] >= value then subtract, otherwise set counter[tid] = 0 (or clamp to
zero) to prevent wrapping; update the decCounter(ThreadID tid, uint64_t value =
1) implementation and any tests to reflect the non-wrapping behavior while
keeping incCounter unchanged.

---

Nitpick comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 377-410: Make decodeScheduler, lsqCounter, iqCounter, and
robCounter owned by std::unique_ptrs instead of raw pointers and update
initDecodeScheduler() to allocate them with std::make_unique; replace bare new
InstsCounter() and new ...Scheduler(...) with make_unique calls and when
constructing schedulers that require raw pointers pass counter.get() (or change
those ctor signatures to accept pointer/reference smartly). Ensure class members
are changed to std::unique_ptr<InstsCounter> and
std::unique_ptr<...SchedulerType> and remove any manual delete/cleanup paths so
RAII manages lifetime; keep references to the symbols initDecodeScheduler,
lsqCounter, iqCounter, robCounter, decodeScheduler, InstsCounter,
ICountScheduler, DelayedICountScheduler, and MultiPrioritySched to locate the
changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 701aed96-9239-43f9-a876-13495e1bb2a6

📥 Commits

Reviewing files that changed from the base of the PR and between f815a13 and bb57418.

📒 Files selected for processing (11)
  • src/cpu/o3/FuncScheduler.py
  • src/cpu/o3/SConscript
  • src/cpu/o3/comm.hh
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/o3/iew.cc
  • src/cpu/o3/inst_queue.cc
  • src/cpu/o3/inst_queue.hh
  • src/cpu/o3/issue_queue.cc
  • src/cpu/o3/issue_queue.hh
  • src/cpu/o3/smt_sched.hh

Comment on lines +1325 to +1349
ThreadID
Fetch::selectUnstalledThread()
{

// if (numThreads == 1) {
// return 0;
// }
for (ThreadID tid = 0; tid < numThreads; ++tid) {
if (!stallSig->blockFetch[tid]) {
lsqCounter->setCounter(tid, fromIEW->iewInfo[tid].ldstqCount);
iqCounter->setCounter(tid, fromIEW->iewInfo[tid].iqCount);
robCounter->setCounter(tid, fromIEW->iewInfo[tid].robCount);

} else {
lsqCounter->setCounter(tid, UINT64_MAX);
iqCounter->setCounter(tid, UINT64_MAX);
robCounter->setCounter(tid, UINT64_MAX);

}
DPRINTF(Fetch, "lsqCounter->setCounter: %d iqCounter->setCounter: %d robCounter->setCounter: %d\n",fromIEW->iewInfo[tid].ldstqCount,fromIEW->iewInfo[tid].iqCount,fromIEW->iewInfo[tid].robCount);
}

ThreadID selected = decodeScheduler->getThread();
return selected;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid selecting threads with empty fetch queues for decode handoff.

The selector currently considers only stallSig->blockFetch. If the chosen thread has an empty fetchQueue, Lines 1397-1408 send nothing even when another unstalled thread has ready instructions.

Proposed fix
 ThreadID
 Fetch::selectUnstalledThread()
 {
+    bool hasCandidate = false;
     for (ThreadID tid = 0; tid < numThreads; ++tid) {
-        if (!stallSig->blockFetch[tid]) {
+        if (!stallSig->blockFetch[tid] && !fetchQueue[tid].empty()) {
+            hasCandidate = true;
             lsqCounter->setCounter(tid, fromIEW->iewInfo[tid].ldstqCount);
             iqCounter->setCounter(tid, fromIEW->iewInfo[tid].iqCount);
             robCounter->setCounter(tid, fromIEW->iewInfo[tid].robCount);
         } else {
             lsqCounter->setCounter(tid, UINT64_MAX);
             iqCounter->setCounter(tid, UINT64_MAX);
             robCounter->setCounter(tid, UINT64_MAX);
         }
     }
+    if (!hasCandidate) {
+        return InvalidThreadID;
+    }
 
     ThreadID selected = decodeScheduler->getThread();
     return selected;
 }
-    ThreadID tid =selectUnstalledThread();
+    ThreadID tid = selectUnstalledThread();
+    if (tid == InvalidThreadID) {
+        toDecode->fetchStallReason = stallReason;
+        return;
+    }

Also applies to: 1387-1408

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1325 - 1349, The selector
Fetch::selectUnstalledThread currently only checks stallSig->blockFetch but may
return a thread whose fetchQueue is empty; update Fetch::selectUnstalledThread
to exclude threads with empty fetchQueue (e.g., fetchQueue[tid].empty() or
equivalent API) when computing candidates and when calling
decodeScheduler->getThread so decode handoff never selects an
unstalled-but-empty thread; ensure the lsqCounter/iqCounter/robCounter updates
remain correct for excluded threads (still set to UINT64_MAX) and update any
logic that relies on selected thread to loop to another ready thread if
decodeScheduler returns an empty-thread id.

Comment on lines +237 to +247
// SMT Decode Scheduler
SMTScheduler* decodeScheduler;

// Counters from backend structures (to be passed in)
InstsCounter* lsqCounter;
InstsCounter* iqCounter;
InstsCounter* robCounter;

// Configuration parameters
std::string smtDecodePolicy ="multi_priority";
int delayedSchedulerDelay;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n -C4 'Fetch::Fetch\s*\(|Fetch::~Fetch\s*\(' src/cpu/o3/fetch.cc

Repository: OpenXiangShan/GEM5

Length of output: 562


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Get the full constructor from line 89 until the closing brace
sed -n '89,170p' src/cpu/o3/fetch.cc

Repository: OpenXiangShan/GEM5

Length of output: 2988


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the initDecodeScheduler implementation
rg -n 'initDecodeScheduler' src/cpu/o3/fetch.cc -A 30

Repository: OpenXiangShan/GEM5

Length of output: 2212


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for where delayedSchedulerDelay is set/initialized
rg -n 'delayedSchedulerDelay\s*=' src/cpu/o3/fetch.cc

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the exact line in the header where delayedSchedulerDelay is declared
rg -n 'delayedSchedulerDelay' src/cpu/o3/fetch.hh -B 1 -A 1

Repository: OpenXiangShan/GEM5

Length of output: 156


delayedSchedulerDelay is uninitialized when used in initDecodeScheduler().

Line 399 of fetch.cc passes delayedSchedulerDelay to DelayedICountScheduler constructor, but this int is never initialized—not in the constructor, not in the class default, and not before use. The pointers (decodeScheduler, lsqCounter, iqCounter, robCounter) are initialized within initDecodeScheduler() on all code paths, but this member remains indeterminate. Add an in-class default or initialize it in the constructor:

Suggested fix
-    int delayedSchedulerDelay;
+    int delayedSchedulerDelay = 0;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// SMT Decode Scheduler
SMTScheduler* decodeScheduler;
// Counters from backend structures (to be passed in)
InstsCounter* lsqCounter;
InstsCounter* iqCounter;
InstsCounter* robCounter;
// Configuration parameters
std::string smtDecodePolicy ="multi_priority";
int delayedSchedulerDelay;
// SMT Decode Scheduler
SMTScheduler* decodeScheduler;
// Counters from backend structures (to be passed in)
InstsCounter* lsqCounter;
InstsCounter* iqCounter;
InstsCounter* robCounter;
// Configuration parameters
std::string smtDecodePolicy ="multi_priority";
int delayedSchedulerDelay = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.hh` around lines 237 - 247, The member delayedSchedulerDelay
is used uninitialized when passed into DelayedICountScheduler in
initDecodeScheduler(); initialize it to a safe default either by adding an
in-class initializer (e.g., int delayedSchedulerDelay = 0;) or by setting it in
the class constructor for the fetch unit so delayedSchedulerDelay has a
well-defined value before initDecodeScheduler() runs; update the declaration in
fetch.hh or the fetch class constructor accordingly and ensure any configuration
parsing assigns it prior to creating DelayedICountScheduler.

scheduleToExecDelay = Param.Cycles(2, "")
oports = VectorParam.IssuePort("")
sel = Param.BaseSelector(BaseSelector(), "Selector for this IQ (default: age first)")
sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: age first)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the parameter description to match the new default selector.

At Line 93, the default is SMTBasedSelector() but the text still says “default: age first”.

Proposed fix
-    sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: age first)")
+    sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: SMT-based)")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: age first)")
sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: SMT-based)")
🧰 Tools
🪛 Ruff (0.15.5)

[error] 93-93: Param may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/FuncScheduler.py` at line 93, Update the parameter description for
the selector parameter so it reflects the actual default: change the help text
passed to Param.BaseSelector for the variable sel (which is currently
constructed with SMTBasedSelector()) to mention "default: SMT-based" (or similar
accurate wording) instead of "default: age first"; ensure the description in
Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (...)") references
SMTBasedSelector as the default.

Comment on lines 1533 to +1538
ThreadID tid = *activeThreads->begin();
toFetch->iewInfo[tid].resolvedCFIs.clear();

toFetch->iewInfo[tid].ldstqCount=ldstQueue.getCount(tid);
toFetch->iewInfo[tid].robCount= rob->getThreadEntries(tid);
toFetch->iewInfo[tid].iqCount= scheduler->getIQInsts(tid);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Publish IEW-to-Fetch counters for all active threads, not only the first one.

At Line 1533, only *activeThreads->begin() is selected, then only that thread’s resolvedCFIs/counts are updated. Fetch-side SMT selection reads fromIEW->iewInfo[tid] for each thread, so other threads can see stale values and be mis-prioritized.

Proposed fix
-    ThreadID tid = *activeThreads->begin();
-    toFetch->iewInfo[tid].resolvedCFIs.clear();
-
-    toFetch->iewInfo[tid].ldstqCount=ldstQueue.getCount(tid);
-    toFetch->iewInfo[tid].robCount= rob->getThreadEntries(tid);
-    toFetch->iewInfo[tid].iqCount= scheduler->getIQInsts(tid);
+    for (ThreadID tid : *activeThreads) {
+        auto &iewInfo = toFetch->iewInfo[tid];
+        iewInfo.resolvedCFIs.clear();
+        iewInfo.ldstqCount = ldstQueue.getCount(tid);
+        iewInfo.robCount   = rob->getThreadEntries(tid);
+        iewInfo.iqCount    = scheduler->getIQInsts(tid);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ThreadID tid = *activeThreads->begin();
toFetch->iewInfo[tid].resolvedCFIs.clear();
toFetch->iewInfo[tid].ldstqCount=ldstQueue.getCount(tid);
toFetch->iewInfo[tid].robCount= rob->getThreadEntries(tid);
toFetch->iewInfo[tid].iqCount= scheduler->getIQInsts(tid);
for (ThreadID tid : *activeThreads) {
auto &iewInfo = toFetch->iewInfo[tid];
iewInfo.resolvedCFIs.clear();
iewInfo.ldstqCount = ldstQueue.getCount(tid);
iewInfo.robCount = rob->getThreadEntries(tid);
iewInfo.iqCount = scheduler->getIQInsts(tid);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 1533 - 1538, The code only updates
IEW-to-Fetch counters for a single thread by using ThreadID tid =
*activeThreads->begin(); instead of iterating all active threads, causing stale
values in toFetch->iewInfo[]. Fix by iterating over every ThreadID in
activeThreads and for each thread update
toFetch->iewInfo[tid].resolvedCFIs.clear() and set
toFetch->iewInfo[tid].ldstqCount = ldstQueue.getCount(tid),
toFetch->iewInfo[tid].robCount = rob->getThreadEntries(tid), and
toFetch->iewInfo[tid].iqCount = scheduler->getIQInsts(tid) so all active threads
publish fresh counters.

Comment on lines +883 to +890
IssueQue::initIndependentIQICountScheduler(int numThreads)
{
assert(instsCounter != nullptr && "InstsCounter must be set first");

independentIQICountScheduler = new IndependentIQICountScheduler(
numThreads, instsCounter);

DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",iqname);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In C++, is it valid to pass a std::stringobject directly to a printf-style variadic API using%s, or should it always be converted with .c_str() first?

💡 Result:

No. For a printf-style variadic function, the %s conversion requires a const char* argument (pointer to a null-terminated byte string). Passing a std::string object itself through ... does not perform the needed conversion, so the call has undefined behavior (and may crash or print garbage). Use myString.c_str() (or .data() in C++17+ if you also ensure null-termination) to pass a const char*. [1][2]

Also, note that passing class types through ... is generally not type-safe and can be conditionally-supported/implementation-defined for some class types; you should not rely on it for std::string. [2][3]

Correct:

std::string s = "hi";
std::printf("%s\n", s.c_str());

Wrong:

std::printf("%s\n", s); // undefined behavior

Sources: [1] (cppreference std::printf), [2] (cppreference “Variadic arguments” / default promotions & class-type notes), [3] (CWG issue on passing class/scoped-enum types to ellipsis).


🏁 Script executed:

cat -n src/cpu/o3/issue_queue.cc | sed -n '870,900p'

Repository: OpenXiangShan/GEM5

Length of output: 1130


🏁 Script executed:

rg -A 5 -B 5 "initIndependentIQICountScheduler" src/cpu/o3/issue_queue.cc

Repository: OpenXiangShan/GEM5

Length of output: 616


🏁 Script executed:

rg "std::string.*iqname" src/cpu/o3/issue_queue.cc

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

rg "iqname" src/cpu/o3/issue_queue.h | head -20

Repository: OpenXiangShan/GEM5

Length of output: 126


🏁 Script executed:

rg "class IssueQue" src/cpu/o3/issue_queue.h -A 100 | grep -E "(iqname|std::string)" | head -20

Repository: OpenXiangShan/GEM5

Length of output: 126


🏁 Script executed:

fd "issue_queue.h" src/cpu/o3/ --exec cat -n {} \; | grep -E "iqname|string" | head -30

Repository: OpenXiangShan/GEM5

Length of output: 446


🏁 Script executed:

rg "define DPRINTF" src/cpu/o3/issue_queue.cc -B 5 -A 5

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

rg "define DPRINTF" --type cpp | head -5

Repository: OpenXiangShan/GEM5

Length of output: 388


🏁 Script executed:

rg "DPRINTF.*Schedule" src/cpu/o3/issue_queue.cc -B 1 -A 1 | head -30

Repository: OpenXiangShan/GEM5

Length of output: 1178


🏁 Script executed:

rg "#define DPRINTF" --type cpp --type h | grep -v test | head -10

Repository: OpenXiangShan/GEM5

Length of output: 774


🏁 Script executed:

find . -name "*.h" -o -name "*.hh" | xargs rg "define DPRINTF" | grep -v test | head -10

Repository: OpenXiangShan/GEM5

Length of output: 865


🏁 Script executed:

rg "DPRINTF" src/base/ | grep define | head -5

Repository: OpenXiangShan/GEM5

Length of output: 433


🏁 Script executed:

cat src/base/trace.hh | grep -A 20 "#define DPRINTF"

Repository: OpenXiangShan/GEM5

Length of output: 2540


Use iqname.c_str() in this debug log.

iqname is a std::string, but %s expects const char*. Passing the object directly to the variadic DPRINTF macro results in undefined behavior.

Suggested fix
-        DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",iqname);
+        DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",
+                iqname.c_str());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
IssueQue::initIndependentIQICountScheduler(int numThreads)
{
assert(instsCounter != nullptr && "InstsCounter must be set first");
independentIQICountScheduler = new IndependentIQICountScheduler(
numThreads, instsCounter);
DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",iqname);
IssueQue::initIndependentIQICountScheduler(int numThreads)
{
assert(instsCounter != nullptr && "InstsCounter must be set first");
independentIQICountScheduler = new IndependentIQICountScheduler(
numThreads, instsCounter);
DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",
iqname.c_str());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` around lines 883 - 890, The DPRINTF call in
IssueQue::initIndependentIQICountScheduler uses "%s" but passes iqname (a
std::string), causing undefined behavior; change the log to pass iqname.c_str()
so the format string receives a const char* (update the DPRINTF invocation that
currently prints "[%s] IndependentIQICountScheduler created.\n" to use
iqname.c_str()).

Comment on lines +31 to +32
void incCounter(ThreadID tid, uint64_t value = 1) { counter[tid] += value; }
void decCounter(ThreadID tid, uint64_t value = 1) { counter[tid] -= value; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent unsigned underflow in decCounter.

At Line 32, counter[tid] -= value can wrap on underflow and produce very large counts, which can break SMT arbitration.

Proposed fix
     void incCounter(ThreadID tid, uint64_t value = 1) { counter[tid] += value; }
-    void decCounter(ThreadID tid, uint64_t value = 1) { counter[tid] -= value; }
+    void decCounter(ThreadID tid, uint64_t value = 1) {
+        assert(tid < MaxThreads);
+        assert(counter[tid] >= value);
+        counter[tid] -= value;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void incCounter(ThreadID tid, uint64_t value = 1) { counter[tid] += value; }
void decCounter(ThreadID tid, uint64_t value = 1) { counter[tid] -= value; }
void incCounter(ThreadID tid, uint64_t value = 1) { counter[tid] += value; }
void decCounter(ThreadID tid, uint64_t value = 1) {
assert(tid < MaxThreads);
assert(counter[tid] >= value);
counter[tid] -= value;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/smt_sched.hh` around lines 31 - 32, decCounter can underflow the
unsigned counter[tid]; change decCounter (and related uses if any) to safely
subtract by checking if counter[tid] >= value then subtract, otherwise set
counter[tid] = 0 (or clamp to zero) to prevent wrapping; update the
decCounter(ThreadID tid, uint64_t value = 1) implementation and any tests to
reflect the non-wrapping behavior while keeping incCounter unchanged.

Co-authored-by: mohaonan <mo.haonan1@sanechips.com.cn>
@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2665 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
src/cpu/o3/issue_queue.cc (1)

890-890: ⚠️ Potential issue | 🟠 Major

Pass iqname.c_str() to %s in DPRINTF.

Line 890 passes a std::string object to a printf-style %s, which expects const char*.

Proposed fix
-        DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",iqname);    
+        DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",
+                iqname.c_str());
#!/bin/bash
# Verify the problematic call site still passes std::string directly.
rg -nP 'DPRINTF\(\s*Schedule\s*,\s*"\[%s\] IndependentIQICountScheduler created\\n"\s*,\s*iqname\s*\)' src/cpu/o3/issue_queue.cc -C 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` at line 890, The DPRINTF call in
IndependentIQICountScheduler is passing a std::string (iqname) to a printf-style
"%s"; change the call to pass a C string by using iqname.c_str() so
DPRINTF(Schedule, "[%s] IndependentIQICountScheduler created.\n",
iqname.c_str()) to avoid type mismatch and ensure correct formatting.
src/cpu/o3/FuncScheduler.py (1)

93-93: ⚠️ Potential issue | 🟡 Minor

Update selector help text to match the new default.

Line 93 now defaults to SMTBasedSelector() but still says “default: age first”.

Proposed fix
-    sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: age first)")
+    sel = Param.BaseSelector(SMTBasedSelector(), "Selector for this IQ (default: SMT-based)")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/FuncScheduler.py` at line 93, The help text for the selector
Param.BaseSelector currently says "Selector for this IQ (default: age first)"
but the default has changed to SMTBasedSelector(); update the help string for
Param.BaseSelector to reflect the new default (e.g., "Selector for this IQ
(default: SMT based)") so the description matches the actual default selector
SMTBasedSelector().
src/cpu/o3/smt_sched.hh (1)

31-32: ⚠️ Potential issue | 🟠 Major

Prevent unsigned underflow in decCounter.

Line 32 can wrap counter[tid] on subtract, producing huge values and skewing SMT arbitration.

Proposed fix
-    void decCounter(ThreadID tid, uint64_t value = 1) { counter[tid] -= value; }
+    void decCounter(ThreadID tid, uint64_t value = 1) {
+        counter[tid] = (counter[tid] >= value) ? (counter[tid] - value) : 0;
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/smt_sched.hh` around lines 31 - 32, decCounter currently subtracts
from the unsigned array element counter[tid] and can underflow; modify
decCounter(ThreadID tid, uint64_t value = 1) to guard against underflow by
checking counter[tid] >= value before subtracting (or set counter[tid] = 0 if
value is larger) so the counter never wraps to a huge value; update any callers
or tests if they rely on wraparound behavior and keep the symbol names
decCounter, counter and ThreadID when locating the change.
src/cpu/o3/iew.cc (1)

1536-1538: ⚠️ Potential issue | 🟠 Major

Publish IEW counters for all active threads, not just one.

Line 1533 selects only *activeThreads->begin(), so Lines 1536-1538 refresh counters for one thread while others can keep stale values used by SMT fetch selection.

Proposed fix
-    ThreadID tid = *activeThreads->begin();
-    toFetch->iewInfo[tid].resolvedCFIs.clear();
-
-    toFetch->iewInfo[tid].ldstqCount=ldstQueue.getCount(tid);
-    toFetch->iewInfo[tid].robCount= rob->getThreadEntries(tid);
-    toFetch->iewInfo[tid].iqCount= scheduler->getIQInsts(tid);
+    for (ThreadID tid : *activeThreads) {
+        auto &iewInfo = toFetch->iewInfo[tid];
+        iewInfo.resolvedCFIs.clear();
+        iewInfo.ldstqCount = ldstQueue.getCount(tid);
+        iewInfo.robCount   = rob->getThreadEntries(tid);
+        iewInfo.iqCount    = scheduler->getIQInsts(tid);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 1536 - 1538, The code only updates IEW
counters for a single thread by using *activeThreads->begin(); change this to
iterate over all active threads and update toFetch->iewInfo[tid].ldstqCount =
ldstQueue.getCount(tid), toFetch->iewInfo[tid].robCount =
rob->getThreadEntries(tid), and toFetch->iewInfo[tid].iqCount =
scheduler->getIQInsts(tid) for each tid in activeThreads; locate the use of
activeThreads, the variable tid, and the assignments to toFetch->iewInfo to
implement the loop so all active SMT threads get refreshed counters.
src/cpu/o3/fetch.cc (1)

1325-1349: ⚠️ Potential issue | 🟠 Major

Exclude empty fetch queues from decode-thread selection.

Line 1333 only checks stallSig->blockFetch; selected thread can still have an empty fetchQueue, causing missed decode opportunities even when other threads are ready.

Proposed fix
 ThreadID
 Fetch::selectUnstalledThread()
 {
+    bool hasCandidate = false;
     for (ThreadID tid = 0; tid < numThreads; ++tid) {
-        if (!stallSig->blockFetch[tid]) {
+        if (!stallSig->blockFetch[tid] && !fetchQueue[tid].empty()) {
+            hasCandidate = true;
             lsqCounter->setCounter(tid, fromIEW->iewInfo[tid].ldstqCount);
             iqCounter->setCounter(tid, fromIEW->iewInfo[tid].iqCount);
             robCounter->setCounter(tid, fromIEW->iewInfo[tid].robCount);
         } else {
             lsqCounter->setCounter(tid, UINT64_MAX);
             iqCounter->setCounter(tid, UINT64_MAX);
             robCounter->setCounter(tid, UINT64_MAX);
         }
     }
+    if (!hasCandidate) {
+        return InvalidThreadID;
+    }
 
     ThreadID selected = decodeScheduler->getThread();
     return selected;
 }
-    ThreadID tid =selectUnstalledThread();
+    ThreadID tid = selectUnstalledThread();
+    if (tid == InvalidThreadID) {
+        toDecode->fetchStallReason = stallReason;
+        return;
+    }

Also applies to: 1387-1408

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 1325 - 1349, The selected thread can be
empty because selectUnstalledThread only checks stallSig->blockFetch; update
Fetch::selectUnstalledThread to treat threads with an empty fetchQueue as
stalled too: inside the loop that currently inspects stallSig->blockFetch[tid],
also test fetchQueue[tid].empty() and if true set
lsqCounter/iqCounter/robCounter to UINT64_MAX (same as the blocked case) and
ensure that such threads are not considered ready by the decode scheduler before
calling decodeScheduler->getThread(); reference symbols:
Fetch::selectUnstalledThread, stallSig->blockFetch, fetchQueue,
lsqCounter->setCounter, iqCounter->setCounter, robCounter->setCounter, and
decodeScheduler->getThread.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/o3/issue_queue.hh`:
- Around line 104-113: The SMTBasedSelector::setparent() method is never invoked
so its smtScheduler remains nullptr; update the Scheduler initialization to call
selector->setparent(this, issueQues[i]) after assigning issueQues[i]->scheduler
= this (so SMTBasedSelector::setparent can store the
IndependentIQICountScheduler pointer), ensuring SMTBasedSelector::select() can
use smtScheduler, or alternatively remove the smtScheduler/SMT path from
SMTBasedSelector if you intend no SMT support; locate the code around the
Scheduler constructor where issueQues are created and call
selector->setparent(this, iq) for non-null selectors (reference symbols:
Scheduler constructor, issueQues, IssueQue::selector,
SMTBasedSelector::setparent, smtScheduler).

---

Duplicate comments:
In `@src/cpu/o3/fetch.cc`:
- Around line 1325-1349: The selected thread can be empty because
selectUnstalledThread only checks stallSig->blockFetch; update
Fetch::selectUnstalledThread to treat threads with an empty fetchQueue as
stalled too: inside the loop that currently inspects stallSig->blockFetch[tid],
also test fetchQueue[tid].empty() and if true set
lsqCounter/iqCounter/robCounter to UINT64_MAX (same as the blocked case) and
ensure that such threads are not considered ready by the decode scheduler before
calling decodeScheduler->getThread(); reference symbols:
Fetch::selectUnstalledThread, stallSig->blockFetch, fetchQueue,
lsqCounter->setCounter, iqCounter->setCounter, robCounter->setCounter, and
decodeScheduler->getThread.

In `@src/cpu/o3/FuncScheduler.py`:
- Line 93: The help text for the selector Param.BaseSelector currently says
"Selector for this IQ (default: age first)" but the default has changed to
SMTBasedSelector(); update the help string for Param.BaseSelector to reflect the
new default (e.g., "Selector for this IQ (default: SMT based)") so the
description matches the actual default selector SMTBasedSelector().

In `@src/cpu/o3/iew.cc`:
- Around line 1536-1538: The code only updates IEW counters for a single thread
by using *activeThreads->begin(); change this to iterate over all active threads
and update toFetch->iewInfo[tid].ldstqCount = ldstQueue.getCount(tid),
toFetch->iewInfo[tid].robCount = rob->getThreadEntries(tid), and
toFetch->iewInfo[tid].iqCount = scheduler->getIQInsts(tid) for each tid in
activeThreads; locate the use of activeThreads, the variable tid, and the
assignments to toFetch->iewInfo to implement the loop so all active SMT threads
get refreshed counters.

In `@src/cpu/o3/issue_queue.cc`:
- Line 890: The DPRINTF call in IndependentIQICountScheduler is passing a
std::string (iqname) to a printf-style "%s"; change the call to pass a C string
by using iqname.c_str() so DPRINTF(Schedule, "[%s] IndependentIQICountScheduler
created.\n", iqname.c_str()) to avoid type mismatch and ensure correct
formatting.

In `@src/cpu/o3/smt_sched.hh`:
- Around line 31-32: decCounter currently subtracts from the unsigned array
element counter[tid] and can underflow; modify decCounter(ThreadID tid, uint64_t
value = 1) to guard against underflow by checking counter[tid] >= value before
subtracting (or set counter[tid] = 0 if value is larger) so the counter never
wraps to a huge value; update any callers or tests if they rely on wraparound
behavior and keep the symbol names decCounter, counter and ThreadID when
locating the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fab9ae1e-e5bf-4056-aebc-e93b051e0d75

📥 Commits

Reviewing files that changed from the base of the PR and between bb57418 and 51fa60b.

📒 Files selected for processing (11)
  • src/cpu/o3/FuncScheduler.py
  • src/cpu/o3/SConscript
  • src/cpu/o3/comm.hh
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/o3/iew.cc
  • src/cpu/o3/inst_queue.cc
  • src/cpu/o3/inst_queue.hh
  • src/cpu/o3/issue_queue.cc
  • src/cpu/o3/issue_queue.hh
  • src/cpu/o3/smt_sched.hh
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/cpu/o3/inst_queue.cc
  • src/cpu/o3/fetch.hh

Comment on lines +104 to +113
class SMTBasedSelector : public BaseSelector
{
private:
IndependentIQICountScheduler* smtScheduler = nullptr;
public:
SMTBasedSelector(const SMTBasedSelectorParams& params) : BaseSelector(params) {}
void setparent(Scheduler* scheduler, IssueQue* iq) override;
void allocate(const DynInstPtr& inst) override { BaseSelector::allocate(inst);}
void deallocate(const DynInstPtr& inst) override { BaseSelector::deallocate(inst);}
ReadyQue::iterator select(ReadyQue::iterator begin, int portid) override;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show where the selector snapshots the pointer and where the queue mutates it.
rg -n -C6 'SMTBasedSelector::setparent\s*\(|IssueQue::setIndependentIQICountScheduler\s*\(|IssueQue::initIndependentIQICountScheduler\s*\(' src/cpu/o3

# Show the relevant wiring / init paths.
rg -n -C6 '->setparent\s*\(|setIndependentIQICountScheduler\s*\(|initIQICountSmtScheduler\s*\(' src/cpu/o3

Repository: OpenXiangShan/GEM5

Length of output: 1826


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the getter and setter implementations
echo "=== Looking for getIndependentIQICountScheduler implementation ==="
rg -n 'getIndependentIQICountScheduler\s*\(' src/cpu/o3 -A 3

echo -e "\n=== Looking for setIndependentIQICountScheduler implementation ==="
rg -n 'setIndependentIQICountScheduler\s*\(' src/cpu/o3 -A 3

echo -e "\n=== Looking for initIndependentIQICountScheduler implementation ==="
rg -n 'initIndependentIQICountScheduler\s*\(' src/cpu/o3 -A 5

Repository: OpenXiangShan/GEM5

Length of output: 2372


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find all calls to setparent to understand initialization order
echo "=== Calls to setparent ==="
rg -n 'setparent' src/cpu/o3 -B 2 -A 2

# Find all places where queue scheduler pointer is accessed
echo -e "\n=== Direct access to independentIQICountScheduler ==="
rg -n 'independentIQICountScheduler' src/cpu/o3 -B 1 -A 1

Repository: OpenXiangShan/GEM5

Length of output: 3164


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the header file to see the public interface
echo "=== IssueQue class definition regarding scheduler methods ==="
rg -n 'class IssueQue|getIndependentIQICountScheduler|setIndependentIQICountScheduler|initIndependentIQICountScheduler' src/cpu/o3/issue_queue.hh -A 2

Repository: OpenXiangShan/GEM5

Length of output: 754


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where setparent is actually called and in what order relative to initialization
echo "=== Where setparent is called ==="
rg -n '\.setparent\(' src/cpu/o3 -B 3 -A 3

echo -e "\n=== Where initIndependentIQICountScheduler is called ==="
rg -n 'initIndependentIQICountScheduler\(' src/cpu/o3 -B 3 -A 3

echo -e "\n=== Where setIndependentIQICountScheduler is called (besides header) ==="
rg -n 'setIndependentIQICountScheduler\(' src/cpu/o3 -B 3 -A 3

Repository: OpenXiangShan/GEM5

Length of output: 95


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if smtScheduler is actually used anywhere
echo "=== Usage of smtScheduler in SMTBasedSelector ==="
rg -n 'smtScheduler' src/cpu/o3 -B 1 -A 1

# Check the select method to see how smtScheduler is used
echo -e "\n=== SMTBasedSelector::select implementation ==="
rg -n 'SMTBasedSelector::select' src/cpu/o3 -A 20

Repository: OpenXiangShan/GEM5

Length of output: 1858


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's search more broadly for where selectors are instantiated and initialized
echo "=== Looking for selector initialization (across all extensions) ==="
rg -n 'SMTBasedSelector\|PAgeSelector\|BaseSelector' src/cpu/o3 --type=cc --type=hh -B 2 -A 2 | head -100

echo -e "\n=== Looking for where the selector is created ==="
rg -n 'new.*Selector' src/cpu/o3 -B 2 -A 2

echo -e "\n=== Looking for Scheduler class to understand wiring ==="
rg -n 'class Scheduler' src/cpu/o3 -A 10 | head -40

Repository: OpenXiangShan/GEM5

Length of output: 173


🏁 Script executed:

#!/bin/bash
set -euo parametertrue

# Let's examine the complete initialization sequence in the constructor or setup
echo "=== IssueQue constructor and initialization ==="
rg -n 'IssueQue::IssueQue' src/cpu/o3 -A 50 | head -80

echo -e "\n=== Scheduler constructor and selector setup ==="
rg -n 'Scheduler::Scheduler' src/cpu/o3 -A 50 | head -80

Repository: OpenXiangShan/GEM5

Length of output: 120


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's search for where selectors are instantiated and initialized
echo "=== Looking for selector creation ==="
rg -n 'new.*Selector|selector.*=' src/cpu/o3 -B 2 -A 2 | head -120

echo -e "\n=== Looking for Scheduler class ==="
rg -n 'class Scheduler' src/cpu/o3 -A 15 | head -50

Repository: OpenXiangShan/GEM5

Length of output: 3831


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's examine the complete initialization sequence
echo "=== IssueQue constructor ==="
rg -n 'IssueQue::IssueQue' src/cpu/o3 -A 60 | head -100

echo -e "\n=== Scheduler constructor and selector setup ==="
rg -n 'Scheduler::Scheduler' src/cpu/o3 -A 60 | head -100

Repository: OpenXiangShan/GEM5

Length of output: 12581


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find ALL calls to setIndependentIQICountScheduler to understand initialization order
echo "=== ALL calls to setIndependentIQICountScheduler ==="
rg -n 'setIndependentIQICountScheduler' src/cpu/o3 -B 3 -A 3

echo -e "\n=== Where is selector->setparent called? ==="
rg -n 'selector.*->.*setparent\|->setparent.*selector' src/cpu/o3 -B 3 -A 3

echo -e "\n=== Look for any initialization code that wires selector and queue together ==="
rg -n 'selector' src/cpu/o3/issue_queue.cc | head -40

Repository: OpenXiangShan/GEM5

Length of output: 720


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Look at the header to see selector's declaration and initialization more carefully
echo "=== Selector declaration in IssueQue ==="
sed -n '180,250p' src/cpu/o3/issue_queue.hh

echo -e "\n=== Constructor parameter passing ==="
rg -n 'IssueQue.*params.sel|selector.*=' src/cpu/o3 -B 2 -A 2

Repository: OpenXiangShan/GEM5

Length of output: 3338


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for where setparent is actually called - broader search
echo "=== Search for setparent invocations in entire codebase ==="
rg -n 'setparent' src/cpu/o3 --type-list | grep -i cpp
rg -n 'setparent' src/cpu

# Look for the point where selector gets wired up
echo -e "\n=== Look for selector initialization/setup ==="
rg -n '\.selector\s*=' src/cpu/o3 -B 3 -A 3

# Find where selector is first used
echo -e "\n=== Where selector is first used after construction ==="
rg -n 'this->selector\|selector->' src/cpu/o3/issue_queue.cc | head -20

Repository: OpenXiangShan/GEM5

Length of output: 528


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if selector->setparent is ever called - try different patterns
echo "=== Try to find if setparent is ever called in any way ==="
rg -n 'setparent' src/cpu -A 5 -B 5

# Look for where the selector would be properly initialized
echo -e "\n=== Search in all files for selector initialization patterns ==="
git ls-files src/cpu | xargs grep -l 'setparent' 2>/dev/null | head -10

Repository: OpenXiangShan/GEM5

Length of output: 4042


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# More direct search - where is setparent actually invoked?
echo "=== Invocations of setparent (not declarations) ==="
rg -n '->setparent\s*\(' src/cpu

echo -e "\n=== Also search for setparent in entire repo ==="
git ls-files | xargs grep '->setparent' 2>/dev/null | grep -v '\.o$' | head -20

echo -e "\n=== Look for CPU or other initialization code that might wire things up ==="
rg -n 'IssueQue\|Scheduler' src/cpu/o3/cpu_impl.hh -A 5 -B 5 | head -80

Repository: OpenXiangShan/GEM5

Length of output: 535


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for where selector is initialized and where setparent gets called
echo "=== Full search for selector initialization in Scheduler constructor ==="
sed -n '936,1100p' src/cpu/o3/issue_queue.cc

echo -e "\n=== Search for any method that calls setparent ==="
rg -n 'selector.*setparent\|\.setparent' src/cpu -B 5 -A 3

Repository: OpenXiangShan/GEM5

Length of output: 6023


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search without arrow to find setparent calls
echo "=== Find all setparent calls using different pattern ==="
rg 'setparent' src/cpu --context=5 | grep -A 5 -B 5 'setparent'

# Look for where Scheduler might call selector's methods
echo -e "\n=== Search for where selector is actually used/initialized ==="
rg 'selector' src/cpu/o3/issue_queue.cc | head -50

Repository: OpenXiangShan/GEM5

Length of output: 4031


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Let's look at the complete IssueQue constructor to see where selector might be wired
echo "=== Complete IssueQue constructor (lines 240-400) ==="
sed -n '240,400p' src/cpu/o3/issue_queue.cc

echo -e "\n=== Look for setCPU or similar initialization methods in IssueQue ==="
rg -n 'void IssueQue::setCPU\|void IssueQue::reset\|void IssueQue::init' src/cpu/o3/issue_queue.cc -A 20 | head -60

Repository: OpenXiangShan/GEM5

Length of output: 5848


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Critical: search for where selector->setparent is actually invoked
echo "=== Exhaustive search for 'setparent' invocations in CPU class files ==="
find src/cpu -name '*.cc' -o -name '*.hh' | xargs grep 'setparent' 2>/dev/null | grep -v '^[^:]*:[0-9]*:\s*//'

echo -e "\n=== Specifically search in CPU impl files ==="
find src/cpu -name '*_impl.hh' -o -name 'cpu.cc' -o -name 'cpu.hh' | xargs grep 'setparent' 2>/dev/null

echo -e "\n=== Look for where the selector gets wired to its queue ==="
find src/cpu -name '*.cc' | xargs grep 'selector' 2>/dev/null | grep -i 'setparent\|wire\|init' | head -20

Repository: OpenXiangShan/GEM5

Length of output: 818


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if smtScheduler pointer is ever dereferenced or accessed
echo "=== All references to smtScheduler in the codebase ==="
rg 'smtScheduler' src/cpu -B 2 -A 2

echo -e "\n=== Check if getIndependentIQICountScheduler() returns nullptr initially ==="
rg -n 'IndependentIQICountScheduler.*nullptr\|=.*nullptr' src/cpu/o3/issue_queue.hh

Repository: OpenXiangShan/GEM5

Length of output: 1040


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for any code that actually CALLS setparent (with different patterns)
echo "=== Search for invocation of setparent method ==="
git grep 'selector.*setparent\|setparent.*selector' | head -20

# Try a broader search
echo -e "\n=== Search across all files for method calls to setparent ==="
git grep '\.setparent' 2>/dev/null | head -20

# Check if setparent might be called through base class or virtual
echo -e "\n=== Look for virtual method dispatch on selector ==="
git grep 'selector->' src/cpu | grep -v 'DPRINTF\|select\|allocate\|deallocate\|begin' | head -20

Repository: OpenXiangShan/GEM5

Length of output: 111


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Confirm: is setparent ever actually called anywhere?
echo "=== Final check: actual invocation of setparent anywhere in repo ==="
git log --all --oneline --grep='setparent' 2>/dev/null | head -5 || echo "No commits about setparent"

echo -e "\n=== Check git diff or git show to see how the code changed ==="
git log --all -p --grep='SMTBasedSelector\|independentIQICountScheduler' -- src/cpu/o3/issue_queue.cc src/cpu/o3/issue_queue.hh 2>/dev/null | head -100

echo -e "\n=== If setparent is never called, smtScheduler would always be nullptr ==="
echo "This means the conditional check 'if (smtScheduler)' at line 160 would never be true"

Repository: OpenXiangShan/GEM5

Length of output: 355


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if initIndependentIQICountScheduler is ever called
echo "=== Where is initIndependentIQICountScheduler called? ==="
rg -n 'initIndependentIQICountScheduler' src/cpu

echo -e "\n=== Check the initialization path of independentIQICountScheduler ==="
rg -n 'independentIQICountScheduler\s*=' src/cpu -B 3 -A 3

Repository: OpenXiangShan/GEM5

Length of output: 1936


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify the complete picture - check what gets initialized in IssueQue constructor
echo "=== IssueQue constructor - all member initialization ==="
sed -n '240,360p' src/cpu/o3/issue_queue.cc | grep -E 'selector|independentIQ|instsCounter|=.*new'

echo -e "\n=== Check if instsCounter is actually used (related to independentIQICountScheduler) ==="
rg -n 'instsCounter' src/cpu/o3 | head -30

Repository: OpenXiangShan/GEM5

Length of output: 1209


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check line 1664 - where initIndependentIQICountScheduler is called
echo "=== Context around line 1664 where initIndependentIQICountScheduler is called ==="
sed -n '1650,1680p' src/cpu/o3/issue_queue.cc

echo -e "\n=== Look for where selector gets initialized/connected ==="
sed -n '936,1050p' src/cpu/o3/issue_queue.cc | grep -A 20 -B 5 'selector'

echo -e "\n=== Search for O3CPU constructor where these get wired ==="
find src/cpu -name '*_impl.hh' | xargs grep -l 'Scheduler\|IssueQue' | head -3

Repository: OpenXiangShan/GEM5

Length of output: 665


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check O3CPU implementation
echo "=== Looking for CPU-side initialization/wiring of selector and queue ==="
rg -n 'cpu.*selector\|selector.*cpu' src/cpu/o3 -B 3 -A 3 | head -60

echo -e "\n=== Does the code ever call a method that would trigger setparent implicitly? ==="
rg -n 'setIQID\|setCPU\|resetDepGraph' src/cpu/o3/issue_queue.cc -A 5 | head -40

Repository: OpenXiangShan/GEM5

Length of output: 134


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Key finding: check if setparent is ever called on the selector
echo "=== Is there ANY call to selector->setparent or equivalent? ==="
git ls-files | xargs grep -n 'selector.*setparent\|\.setparent' 2>/dev/null

echo -e "\n=== Check if there's a setup method that initializes selector with the queue ==="
rg -n 'selector' src/cpu/o3/issue_queue.cc | grep -E 'setparent|init|setup|wire'

echo -e "\n=== Verify: is setparent supposed to be called from Scheduler constructor? ==="
sed -n '936,1050p' src/cpu/o3/issue_queue.cc

Repository: OpenXiangShan/GEM5

Length of output: 127


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Conclusive test: does smtScheduler ever get set?
echo "=== All assignments to smtScheduler ==="
rg -n 'smtScheduler\s*=' src/cpu/o3

echo -e "\n=== This means: if setparent is never called, smtScheduler stays nullptr ==="
echo "=== And the entire if(smtScheduler) block at line 160 is dead code ==="

# Confirm the selector parameter is passed but never used
echo -e "\n=== Verify selector is just stored, never initialized with queue reference ==="
sed -n '240,360p' src/cpu/o3/issue_queue.cc | grep -A 2 'selector'

Repository: OpenXiangShan/GEM5

Length of output: 578


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Final confirmation: trace the complete initialization order
echo "=== Complete initialization trace ==="
echo "1. IssueQue constructor stores selector from params (line 248)"
sed -n '248,248p' src/cpu/o3/issue_queue.cc

echo -e "\n2. Scheduler::Scheduler iterates IssueQueues but does NOT call setparent on selectors"
sed -n '949,952p' src/cpu/o3/issue_queue.cc

echo -e "\n3. initIndependentIQICountScheduler is called AFTER selector would be initialized"
sed -n '1659,1668p' src/cpu/o3/issue_queue.cc

echo -e "\n4. But setparent is never called, so selector never sees the scheduler"
echo "Proof: search returns empty"
git grep 'selector->setparent\|->setparent.*selector' 2>/dev/null || echo "[No results]"

Repository: OpenXiangShan/GEM5

Length of output: 881


SMTBasedSelector::setparent() is never called, leaving smtScheduler permanently null and the selection logic non-functional.

The selector is stored in IssueQue but its setparent() method is never invoked during initialization. As a result, smtScheduler remains nullptr, and the entire if (smtScheduler) block in SMTBasedSelector::select() (line 160) is dead code. The Scheduler constructor sets issueQues[i]->scheduler = this but does not call selector->setparent() on any selector. This leaves the SMT-based selector unable to access the IndependentIQICountScheduler it was designed to use for thread prioritization.

Either complete the selector initialization by calling selector->setparent(this, iq) during Scheduler construction, or remove the non-functional SMT scheduling path from SMTBasedSelector.

Also applies to: 190–193, 239–242

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.hh` around lines 104 - 113, The
SMTBasedSelector::setparent() method is never invoked so its smtScheduler
remains nullptr; update the Scheduler initialization to call
selector->setparent(this, issueQues[i]) after assigning issueQues[i]->scheduler
= this (so SMTBasedSelector::setparent can store the
IndependentIQICountScheduler pointer), ensuring SMTBasedSelector::select() can
use smtScheduler, or alternatively remove the smtScheduler/SMT path from
SMTBasedSelector if you intend no SMT support; locate the code around the
Scheduler constructor where issueQues are created and call
selector->setparent(this, iq) for non-null selectors (reference symbols:
Scheduler constructor, issueQues, IssueQue::selector,
SMTBasedSelector::setparent, smtScheduler).

@tastynoob tastynoob added perf and removed perf labels Mar 17, 2026
@github-actions
Copy link
Copy Markdown

🚀 Performance test triggered: gcc12-spec06-0.8c

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 51fa60b
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 20.69 20.69 -0.00

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2665 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cpu/pred/btb/decoupled_bpred.cc (1)

662-667: ⚠️ Potential issue | 🟡 Minor

Incomplete SMT support: blockPredictionOnce() only affects thread 0.

The // smtTODO comment indicates this was intentionally left incomplete. Currently, when resolve dequeue failures exceed the threshold, only thread 0's prediction is blocked, leaving other threads unaffected. This breaks the intended throttling mechanism for SMT configurations.

Consider passing the failing thread's ID to this function:

🔧 Suggested fix
 void
-DecoupledBPUWithBTB::blockPredictionOnce()
+DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid)
 {
-    // smtTODO
-    threads[0].blockPredictionPending = true;
+    threads[tid].blockPredictionPending = true;
 }

This would also require updating notifyResolveFailure() to track which thread failed and pass that information through.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 662 - 667, The current
blockPredictionOnce() only sets threads[0].blockPredictionPending, which breaks
SMT throttling; change blockPredictionOnce() to accept a thread id (e.g.,
unsigned thread_id or ThreadID) and set
threads[thread_id].blockPredictionPending = true instead of hardcoding 0, then
update notifyResolveFailure() to determine which thread experienced the failure
and call the new blockPredictionOnce(thread_id) variant; update all call sites
to pass the appropriate thread id and adjust any declarations/signatures
accordingly so the blocking affects the failing hardware thread rather than
thread 0.
♻️ Duplicate comments (2)
src/cpu/o3/fetch.hh (1)

245-248: ⚠️ Potential issue | 🔴 Critical

Initialize delayedSchedulerDelay before use.

Line 247 declares delayedSchedulerDelay without initialization, but Fetch::initDecodeScheduler() uses it to construct DelayedICountScheduler. This can introduce undefined behavior.

🔧 Suggested fix
-    int delayedSchedulerDelay;
+    int delayedSchedulerDelay = 0;
#!/bin/bash
set -euo pipefail

# Verify initialization/assignment sites for delayedSchedulerDelay.
rg -nP -C3 'delayedSchedulerDelay' src/cpu/o3/fetch.hh src/cpu/o3/fetch.cc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.hh` around lines 245 - 248, The field delayedSchedulerDelay
is declared uninitialized and then read by Fetch::initDecodeScheduler() when
constructing a DelayedICountScheduler; initialize delayedSchedulerDelay (e.g.,
give it a sensible default in its declaration or set it in the Fetch
constructor/initialization path) or ensure it is assigned from config before
Fetch::initDecodeScheduler() is called so that DelayedICountScheduler always
receives a defined value; locate the symbol delayedSchedulerDelay and the call
site in Fetch::initDecodeScheduler()/DelayedICountScheduler to apply the fix.
src/cpu/o3/iew.cc (1)

1535-1540: ⚠️ Potential issue | 🟠 Major

Publish IEW-to-Fetch state for every active thread.

This block still clears resolvedCFIs and refreshes ldstqCount / robCount / iqCount for only the first active thread. In SMT mode the other threads keep stale IEW state, which can skew frontend decisions and leak old resolved-CFI entries.

Proposed fix
-    ThreadID tid = *activeThreads->begin();
-    toFetch->iewInfo[tid].resolvedCFIs.clear();
-
-    toFetch->iewInfo[tid].ldstqCount=ldstQueue.getCount(tid);
-    toFetch->iewInfo[tid].robCount= rob->getThreadEntries(tid);
-    toFetch->iewInfo[tid].iqCount= scheduler->getIQInsts(tid);
+    for (ThreadID tid : *activeThreads) {
+        auto &iewInfo = toFetch->iewInfo[tid];
+        iewInfo.resolvedCFIs.clear();
+        iewInfo.ldstqCount = ldstQueue.getCount(tid);
+        iewInfo.robCount = rob->getThreadEntries(tid);
+        iewInfo.iqCount = scheduler->getIQInsts(tid);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/iew.cc` around lines 1535 - 1540, The code only updates
toFetch->iewInfo for the first ThreadID from activeThreads, leaving other
threads' iewInfo stale; change the logic to iterate over all entries in
activeThreads and for each ThreadID tid clear toFetch->iewInfo[tid].resolvedCFIs
and refresh toFetch->iewInfo[tid].ldstqCount (use ldstQueue.getCount(tid)),
robCount (use rob->getThreadEntries(tid)) and iqCount (use
scheduler->getIQInsts(tid)) so IEW-to-Fetch state is published for every active
thread.
🧹 Nitpick comments (4)
configs/common/xiangshan.py (1)

446-451: Consider making the thread count configurable.

The SMT configuration looks correct for enabling 2-way SMT. However, hardcoding numThreads = 2 limits flexibility. Consider allowing users to specify the thread count via a command-line argument (e.g., --smt-threads).

Additionally, there's no validation that SMT mode is compatible with other modes like trace mode or difftest. If SMT has known incompatibilities, consider adding guard checks.

💡 Optional: Make thread count configurable
     if args.smt:
         test_sys.multi_thread = True
+        smt_threads = getattr(args, 'smt_threads', 2)

     for cpu in test_sys.cpu:
         if args.smt:
-            cpu.numThreads = 2
+            cpu.numThreads = smt_threads

This would require adding a corresponding --smt-threads argument in Options.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@configs/common/xiangshan.py` around lines 446 - 451, The SMT thread count is
hardcoded to 2; update the configuration to accept a new command-line argument
(e.g., add --smt-threads in Options.py and parse it into args.smt_threads) and
use that value instead of the literal 2 when setting test_sys.multi_thread and
cpu.numThreads; also add validation logic near the SMT handling (where args.smt
is checked) to ensure SMT mode is compatible with other modes (e.g., trace mode
or difftest) and throw a clear error or disable SMT if incompatible.
src/cpu/o3/comm.hh (1)

189-206: Add a boundary regression for the wraparound case.

largerThan() now treats the half-range distance as invalid and panics. A focused test around 7 / 8 / 9 version deltas would make this much safer to maintain.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/comm.hh` around lines 189 - 206, The largerThan(uint8_t other)
method currently panics on the wraparound half-range case; change the logic to
explicitly handle the exact half-range (distance == versionLimit/2) as an
invalid/neutral case instead of hitting the panic: return false (or otherwise
treat as not larger) when distance == 0 or distance == versionLimit/2, keep the
existing checks against maxInflightSquash for forward and backward ranges, and
remove the panic path (or convert it to an assert or error return). Update tests
to add a boundary regression exercising version deltas around the wrap point
(e.g., distances 7, 8, 9 for your versionLimit) to ensure behavior is stable
across wraparound. Ensure references to symbols version, versionLimit,
maxInflightSquash, largerThan, and the panic("SquashVersion: %d, other: %d\n",
version, other) are updated accordingly.
src/cpu/o3/fetch.cc (1)

1387-1387: Minor formatting: missing space after tid.

-    ThreadID tid =selectUnstalledThread();
+    ThreadID tid = selectUnstalledThread();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` at line 1387, The statement declaring ThreadID tid is
missing a space after the '='; update the use of selectUnstalledThread() so the
declaration reads with consistent spacing (e.g., change "ThreadID tid
=selectUnstalledThread();" to "ThreadID tid = selectUnstalledThread();") —
adjust the spacing in the selectUnstalledThread call within the same function to
match surrounding formatting conventions.
src/cpu/o3/issue_queue.cc (1)

143-193: SMTBasedSelector implementation looks correct but has redundant fallback logic.

The fallback loop at lines 180-189 will never find a different instruction than begin because:

  1. If no instruction matches priorityThread, the first loop exhausts all instructions
  2. The second loop then searches for != priorityThread, but since the first loop found nothing, the second will just return the first non-priority instruction

The fallback return begin; at line 192 is also unreachable if there are any instructions.

Consider simplifying:

♻️ Suggested simplification
     for (auto it = begin; it != end; it++) {
         auto& inst = *it;
         
         if (inst->threadNumber == priorityThread) {
             DPRINTF(Schedule, 
                 "[sn:%llu] selected by SMT policy (tid=%d)\n",
                 inst->seqNum, priorityThread);
             return it;
         }
     }
     
-    
-    for (auto it = begin; it != end; it++) {
-        auto& inst = *it;
-        
-        if (inst->threadNumber != priorityThread) {
-            DPRINTF(Schedule, 
-                "[sn:%llu] selected by default (tid=%d, priority=%d)\n",
-                inst->seqNum, inst->threadNumber, priorityThread);
-            return it;
-        }
-    }
-    
-    DPRINTF(Schedule, "SMTBasedSelector: no available instruction\n");
+    // No instruction from priority thread, fall back to oldest
+    DPRINTF(Schedule, 
+        "[sn:%llu] selected by default (tid=%d, priority=%d)\n",
+        (*begin)->seqNum, (*begin)->threadNumber, priorityThread);
     return begin;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` around lines 143 - 193, SMTBasedSelector::select
currently does two full loops which is redundant; change it to a single pass
that records the first non-priority instruction while searching for a
priorityThread match and returns the priority match if found, otherwise returns
the recorded first non-priority candidate (or begin if none). Update the logic
inside SMTBasedSelector::select to eliminate the second loop and the unreachable
DPRINTF/return, using one loop over ReadyQue from begin to end, checking
inst->threadNumber against priorityThread and storing the first non-priority
iterator to return if no priority match is found.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@configs/common/Options.py`:
- Around line 347-349: The help text for the "--smt" argument contains a typo
"multitThread-supported"; update the help string in the parser.add_argument call
for "--smt" (the Option defined with action="store_true") to use the correct
phrasing such as "multi-thread-supported" or "multiThread-supported" (e.g.,
replace "multitThread-supported" with "multi-thread-supported") so the help
message reads correctly.

In `@src/cpu/o3/cpu.cc`:
- Around line 201-205: The FS-SMT change sets active_threads = numThreads but
left interrupt handling tied to thread 0, so fix CPU::getInterrupts,
CPU::getInterruptsNO, CPU::processInterrupts and any trap(..., 0, ...) calls to
route interrupts per hardware thread: use the core/thread index (or the
architectural thread id exposed by the CPU context) instead of hardcoded 0,
iterate over all active_threads when polling/dispatching interrupts, and ensure
per-thread interrupt queues/flags are accessed (not a shared thread-0 structure)
so interrupts meant for thread N are delivered to that thread's state and trap
handlers.

In `@src/cpu/o3/fetch.cc`:
- Around line 377-410: The init allocates lsqCounter, iqCounter, robCounter and
decodeScheduler with new (in Fetch::initDecodeScheduler) but they are never
freed; update Fetch to manage lifetime by either converting these members to
smart pointers (e.g., std::unique_ptr for lsqCounter, iqCounter, robCounter, and
decodeScheduler) or by deleting them in the Fetch destructor (implement ~Fetch()
to delete lsqCounter, iqCounter, robCounter and decodeScheduler and null them).
Ensure constructors/assignments match the chosen approach (replace raw new with
make_unique when using unique_ptr or keep new if adding deletes) and update any
code that assumes raw pointers accordingly.

In `@src/cpu/o3/issue_queue.cc`:
- Around line 356-358: instsCounter is heap-allocated with new InstsCounter()
but never freed, causing a memory leak; change the member to a smart pointer or
free it in the destructor: either replace the raw pointer with
std::unique_ptr<InstsCounter> instsCounter and initialize it (e.g. instsCounter
= std::make_unique<InstsCounter>() in the IssueQueue constructor) and include
<memory>, or add proper cleanup by deleting instsCounter in the IssueQueue
destructor (implement ~IssueQueue() { delete instsCounter; }). Ensure all
references use the chosen owning type (instsCounter/InstsCounter) consistently.
- Line 1646: There's a stray double semicolon on the statement in IssueQueue
where total is incremented: the expression total +=
iq->getInstsCounter()->getCounter(tid);; should use a single semicolon; update
the statement in the method/class containing getInstsCounter()/getCounter (the
code around total += iq->getInstsCounter()->getCounter(tid)) to remove the extra
semicolon and recompile/format.

---

Outside diff comments:
In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 662-667: The current blockPredictionOnce() only sets
threads[0].blockPredictionPending, which breaks SMT throttling; change
blockPredictionOnce() to accept a thread id (e.g., unsigned thread_id or
ThreadID) and set threads[thread_id].blockPredictionPending = true instead of
hardcoding 0, then update notifyResolveFailure() to determine which thread
experienced the failure and call the new blockPredictionOnce(thread_id) variant;
update all call sites to pass the appropriate thread id and adjust any
declarations/signatures accordingly so the blocking affects the failing hardware
thread rather than thread 0.

---

Duplicate comments:
In `@src/cpu/o3/fetch.hh`:
- Around line 245-248: The field delayedSchedulerDelay is declared uninitialized
and then read by Fetch::initDecodeScheduler() when constructing a
DelayedICountScheduler; initialize delayedSchedulerDelay (e.g., give it a
sensible default in its declaration or set it in the Fetch
constructor/initialization path) or ensure it is assigned from config before
Fetch::initDecodeScheduler() is called so that DelayedICountScheduler always
receives a defined value; locate the symbol delayedSchedulerDelay and the call
site in Fetch::initDecodeScheduler()/DelayedICountScheduler to apply the fix.

In `@src/cpu/o3/iew.cc`:
- Around line 1535-1540: The code only updates toFetch->iewInfo for the first
ThreadID from activeThreads, leaving other threads' iewInfo stale; change the
logic to iterate over all entries in activeThreads and for each ThreadID tid
clear toFetch->iewInfo[tid].resolvedCFIs and refresh
toFetch->iewInfo[tid].ldstqCount (use ldstQueue.getCount(tid)), robCount (use
rob->getThreadEntries(tid)) and iqCount (use scheduler->getIQInsts(tid)) so
IEW-to-Fetch state is published for every active thread.

---

Nitpick comments:
In `@configs/common/xiangshan.py`:
- Around line 446-451: The SMT thread count is hardcoded to 2; update the
configuration to accept a new command-line argument (e.g., add --smt-threads in
Options.py and parse it into args.smt_threads) and use that value instead of the
literal 2 when setting test_sys.multi_thread and cpu.numThreads; also add
validation logic near the SMT handling (where args.smt is checked) to ensure SMT
mode is compatible with other modes (e.g., trace mode or difftest) and throw a
clear error or disable SMT if incompatible.

In `@src/cpu/o3/comm.hh`:
- Around line 189-206: The largerThan(uint8_t other) method currently panics on
the wraparound half-range case; change the logic to explicitly handle the exact
half-range (distance == versionLimit/2) as an invalid/neutral case instead of
hitting the panic: return false (or otherwise treat as not larger) when distance
== 0 or distance == versionLimit/2, keep the existing checks against
maxInflightSquash for forward and backward ranges, and remove the panic path (or
convert it to an assert or error return). Update tests to add a boundary
regression exercising version deltas around the wrap point (e.g., distances 7,
8, 9 for your versionLimit) to ensure behavior is stable across wraparound.
Ensure references to symbols version, versionLimit, maxInflightSquash,
largerThan, and the panic("SquashVersion: %d, other: %d\n", version, other) are
updated accordingly.

In `@src/cpu/o3/fetch.cc`:
- Line 1387: The statement declaring ThreadID tid is missing a space after the
'='; update the use of selectUnstalledThread() so the declaration reads with
consistent spacing (e.g., change "ThreadID tid =selectUnstalledThread();" to
"ThreadID tid = selectUnstalledThread();") — adjust the spacing in the
selectUnstalledThread call within the same function to match surrounding
formatting conventions.

In `@src/cpu/o3/issue_queue.cc`:
- Around line 143-193: SMTBasedSelector::select currently does two full loops
which is redundant; change it to a single pass that records the first
non-priority instruction while searching for a priorityThread match and returns
the priority match if found, otherwise returns the recorded first non-priority
candidate (or begin if none). Update the logic inside SMTBasedSelector::select
to eliminate the second loop and the unreachable DPRINTF/return, using one loop
over ReadyQue from begin to end, checking inst->threadNumber against
priorityThread and storing the first non-priority iterator to return if no
priority match is found.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90866f4e-afac-42f0-afe5-ddcfc83a9ebc

📥 Commits

Reviewing files that changed from the base of the PR and between 51fa60b and d461056.

📒 Files selected for processing (21)
  • configs/common/Options.py
  • configs/common/xiangshan.py
  • src/cpu/o3/comm.hh
  • src/cpu/o3/commit.cc
  • src/cpu/o3/commit.hh
  • src/cpu/o3/cpu.cc
  • src/cpu/o3/decode.cc
  • src/cpu/o3/decode.hh
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/o3/iew.cc
  • src/cpu/o3/iew.hh
  • src/cpu/o3/inst_queue.cc
  • src/cpu/o3/issue_queue.cc
  • src/cpu/o3/issue_queue.hh
  • src/cpu/o3/lsq.cc
  • src/cpu/o3/lsq.hh
  • src/cpu/o3/rename.cc
  • src/cpu/o3/rename.hh
  • src/cpu/pred/btb/decoupled_bpred.cc
  • src/cpu/pred/btb/decoupled_bpred.hh

Comment on lines +347 to +349
parser.add_argument("--smt", action="store_true", default=False,
help=""" RISCV SMT support, which requires multitThread-supported gcpt restore and diff-ref-so""")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in help text.

There's a typo in the help text: "multitThread" should be "multiThread" or "multi-thread".

✏️ Proposed fix
     parser.add_argument("--smt", action="store_true", default=False,
-                        help=""" RISCV SMT support, which requires multitThread-supported gcpt restore and diff-ref-so""")
+                        help="""RISCV SMT support, which requires multi-thread-supported gcpt restore and diff-ref-so""")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser.add_argument("--smt", action="store_true", default=False,
help=""" RISCV SMT support, which requires multitThread-supported gcpt restore and diff-ref-so""")
parser.add_argument("--smt", action="store_true", default=False,
help="""RISCV SMT support, which requires multi-thread-supported gcpt restore and diff-ref-so""")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@configs/common/Options.py` around lines 347 - 349, The help text for the
"--smt" argument contains a typo "multitThread-supported"; update the help
string in the parser.add_argument call for "--smt" (the Option defined with
action="store_true") to use the correct phrasing such as
"multi-thread-supported" or "multiThread-supported" (e.g., replace
"multitThread-supported" with "multi-thread-supported") so the help message
reads correctly.

Comment on lines +201 to 205
// FS-SMT still uses one shared workload/system image, but the O3 core
// must provision per-thread architectural state for every hardware
// thread context exposed by the CPU.
active_threads = numThreads;
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

FS SMT activation is incomplete without per-thread interrupt routing.

Line 204 enables all hardware threads in FS, but interrupt fetch/dispatch in this file is still hardwired to thread 0 (CPU::getInterrupts, CPU::getInterruptsNO, CPU::processInterrupts/trap(..., 0, ...)). With SMT active, interrupts targeting non-zero threads can be misdelivered or dropped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 201 - 205, The FS-SMT change sets
active_threads = numThreads but left interrupt handling tied to thread 0, so fix
CPU::getInterrupts, CPU::getInterruptsNO, CPU::processInterrupts and any
trap(..., 0, ...) calls to route interrupts per hardware thread: use the
core/thread index (or the architectural thread id exposed by the CPU context)
instead of hardcoded 0, iterate over all active_threads when polling/dispatching
interrupts, and ensure per-thread interrupt queues/flags are accessed (not a
shared thread-0 structure) so interrupts meant for thread N are delivered to
that thread's state and trap handlers.

Comment on lines +377 to +410
void
Fetch::initDecodeScheduler()
{
// Initialize counters (same as before)
lsqCounter = new InstsCounter();
iqCounter = new InstsCounter();
robCounter = new InstsCounter();
DPRINTF(Fetch, "Initialized SMT Decode Scheduler: 0\n");

for (ThreadID tid = 0; tid < numThreads; tid++)
{
lsqCounter->setCounter(tid, 0);
iqCounter->setCounter(tid, 0);
robCounter->setCounter(tid, 0);
}
DPRINTF(Fetch, "Initialized SMT Decode Scheduler: 1\n");

if (smtDecodePolicy == "icount") {
// Use ROB as default counter for icount
decodeScheduler = new ICountScheduler(numThreads, robCounter);
}
else if (smtDecodePolicy == "delayed") {
decodeScheduler = new DelayedICountScheduler(numThreads, robCounter, delayedSchedulerDelay);
}
else if (smtDecodePolicy == "multi_priority") {
decodeScheduler = new MultiPrioritySched(numThreads, {lsqCounter, iqCounter, robCounter});
}
else {
// Default: round-robin like (use delayed with thread cycling)
decodeScheduler = new DelayedICountScheduler(numThreads, robCounter, numThreads);
}

DPRINTF(Fetch, "Initialized SMT Decode Scheduler: %s\n", smtDecodePolicy.c_str());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory leak: dynamically allocated objects are never freed.

The counters and scheduler are allocated with new but there's no corresponding destructor or cleanup to delete them. This will leak memory when the Fetch object is destroyed.

🛠️ Suggested fix: store as unique_ptr or delete in destructor

Consider using smart pointers for automatic memory management:

+// In fetch.hh, change member declarations:
+std::unique_ptr<InstsCounter> lsqCounter;
+std::unique_ptr<InstsCounter> iqCounter;
+std::unique_ptr<InstsCounter> robCounter;
+std::unique_ptr<SMTScheduler> decodeScheduler;

// In initDecodeScheduler():
-    lsqCounter = new InstsCounter();
-    iqCounter  = new InstsCounter();
-    robCounter = new InstsCounter();
+    lsqCounter = std::make_unique<InstsCounter>();
+    iqCounter  = std::make_unique<InstsCounter>();
+    robCounter = std::make_unique<InstsCounter>();

Or alternatively, add cleanup in the destructor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` around lines 377 - 410, The init allocates lsqCounter,
iqCounter, robCounter and decodeScheduler with new (in
Fetch::initDecodeScheduler) but they are never freed; update Fetch to manage
lifetime by either converting these members to smart pointers (e.g.,
std::unique_ptr for lsqCounter, iqCounter, robCounter, and decodeScheduler) or
by deleting them in the Fetch destructor (implement ~Fetch() to delete
lsqCounter, iqCounter, robCounter and decodeScheduler and null them). Ensure
constructors/assignments match the chosen approach (replace raw new with
make_unique when using unique_ptr or keep new if adding deletes) and update any
code that assumes raw pointers accordingly.

Comment on lines +356 to +358

//Init InstsCounter
instsCounter = new InstsCounter();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory leak: InstsCounter allocated but never deleted.

instsCounter is allocated with new but there's no destructor or cleanup code to free it. Consider using std::unique_ptr or adding cleanup in the destructor.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` around lines 356 - 358, instsCounter is
heap-allocated with new InstsCounter() but never freed, causing a memory leak;
change the member to a smart pointer or free it in the destructor: either
replace the raw pointer with std::unique_ptr<InstsCounter> instsCounter and
initialize it (e.g. instsCounter = std::make_unique<InstsCounter>() in the
IssueQueue constructor) and include <memory>, or add proper cleanup by deleting
instsCounter in the IssueQueue destructor (implement ~IssueQueue() { delete
instsCounter; }). Ensure all references use the chosen owning type
(instsCounter/InstsCounter) consistently.

{
uint32_t total = 0;
for (auto iq : issueQues) {
total += iq->getInstsCounter()->getCounter(tid);;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: double semicolon.

-        total += iq->getInstsCounter()->getCounter(tid);;   
+        total += iq->getInstsCounter()->getCounter(tid);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
total += iq->getInstsCounter()->getCounter(tid);;
total += iq->getInstsCounter()->getCounter(tid);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/issue_queue.cc` at line 1646, There's a stray double semicolon on
the statement in IssueQueue where total is incremented: the expression total +=
iq->getInstsCounter()->getCounter(tid);; should use a single semicolon; update
the statement in the method/class containing getInstsCounter()/getCounter (the
code around total += iq->getInstsCounter()->getCounter(tid)) to remove the extra
semicolon and recompile/format.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cpu/o3/commit.cc (1)

1964-1977: ⚠️ Potential issue | 🔴 Critical

Only squash in-flight rename entries for the matching thread.

localSquashVer is per-thread now, but this loop still squashes every entry in fromRename. If thread A squashes while rename is sending thread B's bundle, B's instructions are dropped before they ever reach the ROB.

Minimal fix
     for (unsigned i_idx = 0; i_idx < fromRename->size; i_idx++) {
         const DynInstPtr &inst = fromRename->insts[i_idx];
+        if (inst->threadNumber != tid) {
+            continue;
+        }
         DPRINTF(Commit, "[tid:%i] [sn:%llu] Squashing in-flight "
                 "instruction PC %s\n",
                 inst->threadNumber, inst->seqNum, inst->pcState());
         inst->setSquashed();
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/commit.cc` around lines 1964 - 1977, The loop currently squashes
every entry in fromRename regardless of thread, dropping instructions from other
threads; update the loop that iterates fromRename->insts[i_idx] so you only call
setSquashed() (and DPRINTF) for entries where inst->threadNumber == tid, leaving
other entries intact; keep the surrounding logic (fixedbuffer[tid].clear(),
localSquashVer[tid].update(...), toIEW->commitInfo[tid].squashVersion assignment
and DPRINTF) unchanged but ensure the squash/version update runs only after
handling the matching-thread entries.
src/cpu/pred/btb/btb_ittage.cc (1)

547-559: ⚠️ Potential issue | 🔴 Critical

tageHit() is still hardwired to thread 0.

After the per-thread split, putPCHistory() only populates threadMeta[tid] for the current thread. Calling getPredictionMeta(0) here means thread 1+ can read stale thread-0 state, and on the first indirect prediction it can dereference a null meta.

🩹 Minimal fix
-bool
-BTBITTAGE::tageHit()
+bool
+BTBITTAGE::tageHit(ThreadID tid)
 {
-    auto meta = getPredictionMeta(0);
-    auto preds = std::static_pointer_cast<TageMeta>(meta)->preds;
+    auto meta = std::static_pointer_cast<TageMeta>(getPredictionMeta(tid));
+    if (!meta) {
+        return false;
+    }
+    auto &preds = meta->preds;
     bool hit = false;
     for (auto & [pc, pred] : preds) {
         if (pred.mainInfo.found) {
             hit = true;
             break;
         }
     }
     return hit;
 }
// src/cpu/pred/btb/decoupled_bpred.cc
if (pred_taken_entry.isIndirect && ittage->tageHit(tid)) {
    finalPred.s3Source = ittage->getComponentIdx();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/btb_ittage.cc` around lines 547 - 559, The function
BTBITTAGE::tageHit() is using getPredictionMeta(0) and thus reads thread-0
state; change the API and implementation to accept a thread id parameter (e.g.,
int tid) and call getPredictionMeta(tid) so it checks the correct threadMeta
populated by putPCHistory(); then update all callers (for example the call site
in decoupled_bpred.cc that currently calls ittage->tageHit()) to pass the
current tid (e.g., ittage->tageHit(tid)) and ensure any nullptr or missing-meta
handling remains correct.
♻️ Duplicate comments (1)
src/cpu/o3/cpu.cc (1)

200-205: ⚠️ Potential issue | 🔴 Critical

FS-SMT is still wired to thread 0 interrupts.

Making all FS hardware threads active here exposes tid > 0, but interrupt polling and dispatch below still go through interrupts[0] and trap(..., 0, ...). That can misdeliver or drop interrupts for non-zero threads.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/cpu.cc` around lines 200 - 205, The FS-SMT change enables multiple
active_threads but interrupt handling still uses interrupts[0] and trap(..., 0,
...), which will misdeliver interrupts for tid > 0; update the interrupt
polling/dispatch logic to use the per-thread interrupt entry for each hardware
thread (e.g., use interrupts[tid] rather than interrupts[0]) and call trap with
the actual thread id (trap(..., tid, ...)), and ensure any loops or checks that
assumed thread-0 are adjusted to iterate per active thread (references:
active_threads, interrupts[], trap(..., 0, ...)).
🧹 Nitpick comments (6)
.gitignore (1)

74-78: Consider scoping these extension ignores to output directories.

At Lines 74-78, global patterns like *.db / *.gz / *.zstd may accidentally mask legitimate repo files (for tests or fixtures). Prefer directory-scoped ignores (e.g., m5out/**, microbench/output/**) unless repo policy is to ban these extensions everywhere.

Example scoped alternative
-*.bin
-*.db
-*.log
-*.gz
-*.zstd
+m5out/**/*.bin
+m5out/**/*.db
+m5out/**/*.log
+microbench/output/**/*.gz
+microbench/output/**/*.zstd
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 74 - 78, The gitignore currently uses global
file-extension patterns (*.bin, *.db, *.log, *.gz, *.zstd) which can hide
legitimate repo files; change these to directory-scoped patterns instead (for
example replace or supplement the global patterns with entries like m5out/**,
microbench/output/**, or other known output directories) or remove the global
entries if the project policy doesn't allow blanket extension ignores; update
the .gitignore to target only build/output dirs and keep any necessary
exceptions for test/fixture files so legitimate repo assets are not accidentally
ignored.
configs/common/xiangshan.py (1)

446-451: Consider making numThreads configurable.

The hardcoded value cpu.numThreads = 2 limits SMT to exactly 2 threads. Consider making this configurable via a command-line argument for flexibility.

Suggested enhancement for configurable thread count
 if args.smt:
     test_sys.multi_thread = True
+    smt_threads = getattr(args, 'smt_threads', 2)
 
 for cpu in test_sys.cpu:
     if args.smt:
-        cpu.numThreads = 2
+        cpu.numThreads = smt_threads

This would require adding an --smt-threads argument to the parser in Options.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@configs/common/xiangshan.py` around lines 446 - 451, The code currently
forces SMT to two threads by setting cpu.numThreads = 2 when args.smt is true;
make the thread count configurable by adding a new CLI option (e.g.,
--smt-threads) in Options.py, parse it onto args (e.g., args.smt_threads),
default to 2 for backward compatibility, set test_sys.multi_thread when args.smt
is true as before, and replace the hardcoded cpu.numThreads = 2 assignment with
cpu.numThreads = args.smt_threads so each cpu uses the configured SMT thread
count.
src/cpu/pred/btb/ras.cc (1)

119-122: Make the shared_ptr erasure explicit for readability/tooling stability.

Line 122 currently relies on implicit conversion from std::shared_ptr<RASMeta> to std::shared_ptr<void>. An explicit cast makes intent clearer and can avoid noisy static-analysis reports here.

Proposed cleanup
 std::shared_ptr<void>
 BTBRAS::getPredictionMeta(ThreadID tid)
 {
     (void)tid;
-    return meta;
+    return std::static_pointer_cast<void>(meta);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/ras.cc` around lines 119 - 122, The return in
BTBRAS::getPredictionMeta currently relies on implicit conversion from
std::shared_ptr<RASMeta> (meta) to std::shared_ptr<void>; make this conversion
explicit by returning a casted shared_ptr (e.g., use
std::static_pointer_cast<void>(meta)) so the intent is clear and static-analysis
tools won't complain.
src/cpu/pred/btb/mbtb.cc (1)

316-320: Initialize meta on first access to avoid null metadata paths.

This getter can still return null before the first putPCHistory. Given downstream dereferences of metadata in update flow, lazy init here would harden startup/edge timing behavior.

Proposed hardening
 std::shared_ptr<void>
 MBTB::getPredictionMeta(ThreadID tid)
 {
     (void)tid;
+    if (!meta) {
+        meta = std::make_shared<BTBMeta>();
+    }
     return meta;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/mbtb.cc` around lines 316 - 320, MBTB::getPredictionMeta can
return null before the first putPCHistory; change it to lazily initialize the
meta field on first access: inside MBTB::getPredictionMeta(ThreadID tid) check
if meta is null and, if so, allocate and initialize the appropriate metadata
object (same type created by putPCHistory) and assign it to meta before
returning it; reference MBTB::getPredictionMeta and putPCHistory so you create
the same metadata structure/initial values putPCHistory would produce to avoid
downstream null dereferences.
src/cpu/pred/btb/microtage.cc (1)

828-831: Make the fallback index helper thread-aware.

This overload still hashes with historyState(0). Any remaining SMT caller that hits it will read the wrong folded history and produce the wrong set index. Prefer threading ThreadID through here as well, or removing the overload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/microtage.cc` around lines 828 - 831, The fallback
MicroTAGE::getTageIndex(Addr pc, int t) uses
historyState(0).indexFoldedHist[t].get() which is not SMT/thread-safe; update
this overload to accept and thread a ThreadID (or remove the overload and force
callers to call the ThreadID-aware variant) so the folded history is read from
historyState(thread).indexFoldedHist[t].get() (or equivalent per-thread
accessor). Modify callers that currently use the two-arg getTageIndex to pass
the correct ThreadID, and ensure all uses reference the per-thread folded
history rather than historyState(0).
src/cpu/pred/btb/btb_tage.cc (1)

879-883: Make the fallback hash helpers thread-aware.

The rest of this predictor is now per-thread, but these overloads still hash against historyState(0). Any remaining SMT caller that reaches them will compute the wrong tag/index instead of failing loudly. Either thread ThreadID through here too, or remove these overloads so call sites must be explicit.

Also applies to: 900-902

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/btb_tage.cc` around lines 879 - 883, The fallback overloads
of BTBTAGE::getTageTag currently call historyState(0) which is not
SMT/thread-aware; update these overloads to accept and propagate a ThreadID (or
thread index) into the call so they use historyState(threadId) when selecting
state.tagFoldedHist and state.altTagFoldedHist, or remove these convenience
overloads entirely so all call sites must call the explicit getTageTag(pc, t,
tagFoldedHist, altTagFoldedHist, position) variant; ensure every caller is
updated to pass the correct ThreadID or the explicit folded-hist pointers to
avoid incorrect cross-thread hashing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/base.cc`:
- Around line 220-231: Use the resolved CPU id (_cpuId) and a system-wide stride
when deriving difftest identities instead of params().cpu_id and numThreads:
compute a hart id like base = _cpuId * system->maxThreadsPerCPU() and set
diff_state->diff.cpu_id = base + tid (or update difftestHartId to use _cpuId and
maxThreadsPerCPU). Also pass _cpuId (not params().cpu_id) into the proxy
constructors (SpikeProxy/NemuProxy). Apply the same change to the other
occurrence mentioned (lines ~882-885) so all difftest IDs and proxy
initializations use the resolved _cpuId and the system->maxThreadsPerCPU()
stride.
- Around line 418-425: The crash happens because diffAllStates entries may have
a null proxy when constructed via the !enableDifftest path; in the
system->multiContextDifftest() loop you must not dereference
diffAllStates[tid]->proxy unconditionally. Fix by guarding the init call: either
wrap the entire per-thread init in a check that proxy is non-null (e.g., if
(diffAllStates[tid]->proxy) call proxy->initState(...)), or only run this
multiContextDifftest block when enableDifftest is true; update the code around
system->multiContextDifftest(), diffAllStates, proxy, initState and
difftestHartId to ensure proxy is validated before use.

In `@src/cpu/o3/fetch.cc`:
- Line 150: Fetch::Fetch() calls initDecodeScheduler() before reading
smtDecodePolicy and its delay from BaseO3CPUParams, so the scheduler always uses
the hardcoded "multi_priority" branch and never reaches the icount/delayed
paths; fix by initializing smtDecodePolicy and the decode-delay field from the
CPU params (BaseO3CPUParams) in the Fetch constructor (and any other place that
calls initDecodeScheduler()) before calling initDecodeScheduler(), ensuring the
policy string and delay value from params are assigned to the Fetch instance so
initDecodeScheduler() can pick the correct branch (icount/delayed/etc.).

In `@src/cpu/o3/rename.cc`:
- Around line 422-430: The loop computing release_pending compares
releaseSeq[tid] and finalCommitSeq[tid] for all tids including exited threads,
which can leave release_pending true indefinitely; update the logic in rename
(where release_pending is set) to only consider currently active threads (use
the activeThreads mask/collection or ask releasePhysRegs() for active thread
IDs) or ensure clearStates() clears finalCommitSeq/releaseSeq for removed
threads so exited tids don’t trigger release_pending; reference release_pending,
releaseSeq[], finalCommitSeq[], releasePhysRegs(), and clearStates() when making
the change.

In `@src/cpu/pred/btb/btb_mgsc.hh`:
- Line 162: Remove the default argument from the getPredictionMeta declaration
in btb_mgsc.hh so its signature becomes std::shared_ptr<void>
getPredictionMeta(ThreadID tid) override; then update all callers (tests and
code that currently rely on the implicit tid=0) to pass an explicit ThreadID
value (e.g., 0 or the appropriate thread variable). Search for usages of
getPredictionMeta() with no arguments (examples in btb_mgsc.test.cc,
btb_tage.test.cc, ras_test.cc) and change them to getPredictionMeta(0) or
getPredictionMeta(tid) as appropriate to preserve correct SMT behavior.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 459-467: When ftq.hasTarget(target_id, tid) is false, also reset
the per-thread speculative history instead of only setting threads[tid].s0PC;
call the per-thread history reset/recovery on historyManagers[tid] (e.g.
historyManagers[tid]->reset() or resetToPC(redirect_pc)) and then
refresh/recompute any folded histories derived from it so they no longer reflect
the wrong path before returning; keep the existing clearPreds(tid) and
threads[tid].validprediction = false lines but ensure historyManagers[tid] and
the folded-history state are explicitly restored to match the redirect PC.

In `@src/cpu/pred/btb/decoupled_bpred.hh`:
- Around line 705-706: The overloaded resetPC(Addr) is ambiguous (it resets all
threads) and conflicts with resetPC(ThreadID, Addr) (per-thread); remove the
single-arg overload or rename it to resetPCAllThreads(Addr) to make semantics
explicit, then update all call sites that intended per-thread behavior to call
resetPC(ThreadID, Addr) and only keep/allow the single-arg form for true
global-reset/init paths; search for and verify every use of resetPC(Addr) and
change them to either resetPCAllThreads(new_pc) or resetPC(tid, new_pc) as
appropriate.

In `@src/cpu/pred/btb/microtage.hh`:
- Line 127: Remove the implicit ThreadID default from the method signatures so
callers must pass an explicit thread id: update the declaration of
getPredictionMeta(ThreadID tid = 0) to getPredictionMeta(ThreadID tid) and do
the same for generateSinglePrediction(...) (remove any = 0 on ThreadID
parameters). Then search for all call sites and tests that relied on the default
and update them to supply the correct ThreadID explicitly (including unit tests
and documentation examples) to avoid accidental use of thread-0 state; ensure
the updated signatures are reflected in any overridden implementations and
declarations (matching function names getPredictionMeta and
generateSinglePrediction).

In `@src/cpu/pred/btb/uras.cc`:
- Around line 88-90: BTBuRAS::getPredictionMeta currently ignores the ThreadID
tid which causes shared uRAS state (specStack, nonSpecStack, meta) to
leak/poison across SMT threads; update getPredictionMeta (and related state in
BTBuRAS) to index or select the per-thread RAS state using tid (e.g.,
maintain/lookup per-thread specStack/nonSpecStack/meta containers or make those
structures thread-local) and return prediction metadata from the tid-specific
stacks/meta instead of the global ones so each thread's return history and
prediction metadata remain isolated.

In `@src/cpu/simple/base.cc`:
- Around line 522-529: BaseSimpleCPU::readGem5Regs currently indexes
diffAllStates[tid] and threadContexts[tid] directly which can crash if tid is
out of range or the per-thread pointers are null; add defensive checks at the
start of readGem5Regs to verify tid is within diffAllStates and threadContexts
bounds and that diffAllStates[tid] and threadContexts[tid] are non-null, and if
any check fails return early (or log/raise a descriptive error) instead of
dereferencing; update references to diffAllStates and threadContexts in the
function body to assume these validated pointers are safe.

---

Outside diff comments:
In `@src/cpu/o3/commit.cc`:
- Around line 1964-1977: The loop currently squashes every entry in fromRename
regardless of thread, dropping instructions from other threads; update the loop
that iterates fromRename->insts[i_idx] so you only call setSquashed() (and
DPRINTF) for entries where inst->threadNumber == tid, leaving other entries
intact; keep the surrounding logic (fixedbuffer[tid].clear(),
localSquashVer[tid].update(...), toIEW->commitInfo[tid].squashVersion assignment
and DPRINTF) unchanged but ensure the squash/version update runs only after
handling the matching-thread entries.

In `@src/cpu/pred/btb/btb_ittage.cc`:
- Around line 547-559: The function BTBITTAGE::tageHit() is using
getPredictionMeta(0) and thus reads thread-0 state; change the API and
implementation to accept a thread id parameter (e.g., int tid) and call
getPredictionMeta(tid) so it checks the correct threadMeta populated by
putPCHistory(); then update all callers (for example the call site in
decoupled_bpred.cc that currently calls ittage->tageHit()) to pass the current
tid (e.g., ittage->tageHit(tid)) and ensure any nullptr or missing-meta handling
remains correct.

---

Duplicate comments:
In `@src/cpu/o3/cpu.cc`:
- Around line 200-205: The FS-SMT change enables multiple active_threads but
interrupt handling still uses interrupts[0] and trap(..., 0, ...), which will
misdeliver interrupts for tid > 0; update the interrupt polling/dispatch logic
to use the per-thread interrupt entry for each hardware thread (e.g., use
interrupts[tid] rather than interrupts[0]) and call trap with the actual thread
id (trap(..., tid, ...)), and ensure any loops or checks that assumed thread-0
are adjusted to iterate per active thread (references: active_threads,
interrupts[], trap(..., 0, ...)).

---

Nitpick comments:
In @.gitignore:
- Around line 74-78: The gitignore currently uses global file-extension patterns
(*.bin, *.db, *.log, *.gz, *.zstd) which can hide legitimate repo files; change
these to directory-scoped patterns instead (for example replace or supplement
the global patterns with entries like m5out/**, microbench/output/**, or other
known output directories) or remove the global entries if the project policy
doesn't allow blanket extension ignores; update the .gitignore to target only
build/output dirs and keep any necessary exceptions for test/fixture files so
legitimate repo assets are not accidentally ignored.

In `@configs/common/xiangshan.py`:
- Around line 446-451: The code currently forces SMT to two threads by setting
cpu.numThreads = 2 when args.smt is true; make the thread count configurable by
adding a new CLI option (e.g., --smt-threads) in Options.py, parse it onto args
(e.g., args.smt_threads), default to 2 for backward compatibility, set
test_sys.multi_thread when args.smt is true as before, and replace the hardcoded
cpu.numThreads = 2 assignment with cpu.numThreads = args.smt_threads so each cpu
uses the configured SMT thread count.

In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 879-883: The fallback overloads of BTBTAGE::getTageTag currently
call historyState(0) which is not SMT/thread-aware; update these overloads to
accept and propagate a ThreadID (or thread index) into the call so they use
historyState(threadId) when selecting state.tagFoldedHist and
state.altTagFoldedHist, or remove these convenience overloads entirely so all
call sites must call the explicit getTageTag(pc, t, tagFoldedHist,
altTagFoldedHist, position) variant; ensure every caller is updated to pass the
correct ThreadID or the explicit folded-hist pointers to avoid incorrect
cross-thread hashing.

In `@src/cpu/pred/btb/mbtb.cc`:
- Around line 316-320: MBTB::getPredictionMeta can return null before the first
putPCHistory; change it to lazily initialize the meta field on first access:
inside MBTB::getPredictionMeta(ThreadID tid) check if meta is null and, if so,
allocate and initialize the appropriate metadata object (same type created by
putPCHistory) and assign it to meta before returning it; reference
MBTB::getPredictionMeta and putPCHistory so you create the same metadata
structure/initial values putPCHistory would produce to avoid downstream null
dereferences.

In `@src/cpu/pred/btb/microtage.cc`:
- Around line 828-831: The fallback MicroTAGE::getTageIndex(Addr pc, int t) uses
historyState(0).indexFoldedHist[t].get() which is not SMT/thread-safe; update
this overload to accept and thread a ThreadID (or remove the overload and force
callers to call the ThreadID-aware variant) so the folded history is read from
historyState(thread).indexFoldedHist[t].get() (or equivalent per-thread
accessor). Modify callers that currently use the two-arg getTageIndex to pass
the correct ThreadID, and ensure all uses reference the per-thread folded
history rather than historyState(0).

In `@src/cpu/pred/btb/ras.cc`:
- Around line 119-122: The return in BTBRAS::getPredictionMeta currently relies
on implicit conversion from std::shared_ptr<RASMeta> (meta) to
std::shared_ptr<void>; make this conversion explicit by returning a casted
shared_ptr (e.g., use std::static_pointer_cast<void>(meta)) so the intent is
clear and static-analysis tools won't complain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1dee718d-2c4f-467c-b9c6-3ccd77651642

📥 Commits

Reviewing files that changed from the base of the PR and between d461056 and 589b793.

📒 Files selected for processing (44)
  • .gitignore
  • configs/common/xiangshan.py
  • src/cpu/base.cc
  • src/cpu/base.hh
  • src/cpu/difftest.cc
  • src/cpu/difftest.hh
  • src/cpu/o3/commit.cc
  • src/cpu/o3/cpu.cc
  • src/cpu/o3/cpu.hh
  • src/cpu/o3/decode.cc
  • src/cpu/o3/decode.hh
  • src/cpu/o3/fetch.cc
  • src/cpu/o3/fetch.hh
  • src/cpu/o3/inst_queue.cc
  • src/cpu/o3/lsq.hh
  • src/cpu/o3/lsq_unit.cc
  • src/cpu/o3/rename.cc
  • src/cpu/o3/rename.hh
  • src/cpu/pred/btb/abtb.cc
  • src/cpu/pred/btb/abtb.hh
  • src/cpu/pred/btb/btb_ittage.cc
  • src/cpu/pred/btb/btb_ittage.hh
  • src/cpu/pred/btb/btb_mgsc.cc
  • src/cpu/pred/btb/btb_mgsc.hh
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
  • src/cpu/pred/btb/btb_tage_ub.cc
  • src/cpu/pred/btb/btb_tage_ub.hh
  • src/cpu/pred/btb/btb_ubtb.hh
  • src/cpu/pred/btb/decoupled_bpred.cc
  • src/cpu/pred/btb/decoupled_bpred.hh
  • src/cpu/pred/btb/mbtb.cc
  • src/cpu/pred/btb/mbtb.hh
  • src/cpu/pred/btb/microtage.cc
  • src/cpu/pred/btb/microtage.hh
  • src/cpu/pred/btb/ras.cc
  • src/cpu/pred/btb/ras.hh
  • src/cpu/pred/btb/timed_base_pred.hh
  • src/cpu/pred/btb/uras.cc
  • src/cpu/pred/btb/uras.hh
  • src/cpu/simple/base.cc
  • src/cpu/simple/base.hh
  • src/sim/system.cc
  • src/sim/system.hh
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cpu/o3/decode.cc

Comment on lines +220 to +231
diff_state->diff.cpu_id = difftestHartId(tid);
warn("difftest hart id set to %d for tid %d\n",
diff_state->diff.cpu_id, tid);

if (params().difftest_ref_so.find("spike") != std::string::npos) {
assert(!system->multiContextDifftest());
diff_state->proxy = new SpikeProxy(
params().cpu_id, params().difftest_ref_so.c_str(),
params().nemuSDimg.size() && params().nemuSDCptBin.size());
} else {
diff_state->proxy =
new NemuProxy(params().cpu_id, params().difftest_ref_so.c_str(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Derive difftest identity from the resolved CPU id, not the raw param.

BaseCPU can auto-assign _cpuId when cpu_id == -1, but this new code still seeds diff.cpu_id and the proxy constructors from params().cpu_id. That leaves auto-assigned CPUs with stale ids, and params().cpu_id * numThreads + tid also stops being globally unique once CPUs have different SMT widths. Please key this off _cpuId and use a system-wide stride such as maxThreadsPerCPU.

Also applies to: 882-885

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/base.cc` around lines 220 - 231, Use the resolved CPU id (_cpuId) and
a system-wide stride when deriving difftest identities instead of
params().cpu_id and numThreads: compute a hart id like base = _cpuId *
system->maxThreadsPerCPU() and set diff_state->diff.cpu_id = base + tid (or
update difftestHartId to use _cpuId and maxThreadsPerCPU). Also pass _cpuId (not
params().cpu_id) into the proxy constructors (SpikeProxy/NemuProxy). Apply the
same change to the other occurrence mentioned (lines ~882-885) so all difftest
IDs and proxy initializations use the resolved _cpuId and the
system->maxThreadsPerCPU() stride.

threads[tid].data = new uint8_t[fetchBufferSize];
}

initDecodeScheduler();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The decode scheduler policy is still effectively hardcoded.

Fetch::Fetch() calls initDecodeScheduler() immediately, but this class never assigns smtDecodePolicy from BaseO3CPUParams before that happens. As written, every instance takes the hardcoded "multi_priority" branch, so the icount/delayed paths are unreachable until the policy and delay are wired from params.

Also applies to: 393-406

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/fetch.cc` at line 150, Fetch::Fetch() calls initDecodeScheduler()
before reading smtDecodePolicy and its delay from BaseO3CPUParams, so the
scheduler always uses the hardcoded "multi_priority" branch and never reaches
the icount/delayed paths; fix by initializing smtDecodePolicy and the
decode-delay field from the CPU params (BaseO3CPUParams) in the Fetch
constructor (and any other place that calls initDecodeScheduler()) before
calling initDecodeScheduler(), ensuring the policy string and delay value from
params are assigned to the Fetch instance so initDecodeScheduler() can pick the
correct branch (icount/delayed/etc.).

Comment on lines +422 to +430
bool release_pending = false;
for (ThreadID tid = 0; tid < numThreads; ++tid) {
if (releaseSeq[tid] < finalCommitSeq[tid]) {
release_pending = true;
break;
}
}

if (wroteToTimeBuffer || release_pending) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

release_pending can stick forever on an exited thread.

This scans every tid, but releasePhysRegs() only advances activeThreads. The new finalCommitSeq/releaseSeq state is not cleared in clearStates(), so a removed thread can leave this true indefinitely and keep rename permanently active.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/o3/rename.cc` around lines 422 - 430, The loop computing
release_pending compares releaseSeq[tid] and finalCommitSeq[tid] for all tids
including exited threads, which can leave release_pending true indefinitely;
update the logic in rename (where release_pending is set) to only consider
currently active threads (use the activeThreads mask/collection or ask
releasePhysRegs() for active thread IDs) or ensure clearStates() clears
finalCommitSeq/releaseSeq for removed threads so exited tids don’t trigger
release_pending; reference release_pending, releaseSeq[], finalCommitSeq[],
releasePhysRegs(), and clearStates() when making the change.

std::vector<FullBTBPrediction> &stagePreds) override;

std::shared_ptr<void> getPredictionMeta() override;
std::shared_ptr<void> getPredictionMeta(ThreadID tid = 0) override;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/cpu/pred/btb/btb_mgsc.hh | sed -n '155,170p'

Repository: OpenXiangShan/GEM5

Length of output: 1091


🏁 Script executed:

rg "getPredictionMeta" --type cc --type hh -n

Repository: OpenXiangShan/GEM5

Length of output: 88


🏁 Script executed:

rg "getPredictionMeta\s*\(\s*\)" --type cc --type hh -n

Repository: OpenXiangShan/GEM5

Length of output: 88


🏁 Script executed:

rg "getPredictionMeta" -n

Repository: OpenXiangShan/GEM5

Length of output: 6530


🏁 Script executed:

rg "getPredictionMeta\s*\(\s*\)" -n

Repository: OpenXiangShan/GEM5

Length of output: 4244


Remove the default tid = 0 argument from getPredictionMeta.

Multiple test files currently call getPredictionMeta() without arguments, relying on the default tid = 0. Removing this default forces callers to explicitly specify the thread ID, which is critical for correctness in SMT systems where implicitly defaulting to thread 0 could mask bugs. This change will require updates to test files (btb_mgsc.test.cc, btb_tage.test.cc, ras_test.cc, etc.) to pass an explicit thread ID.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/btb_mgsc.hh` at line 162, Remove the default argument from
the getPredictionMeta declaration in btb_mgsc.hh so its signature becomes
std::shared_ptr<void> getPredictionMeta(ThreadID tid) override; then update all
callers (tests and code that currently rely on the implicit tid=0) to pass an
explicit ThreadID value (e.g., 0 or the appropriate thread variable). Search for
usages of getPredictionMeta() with no arguments (examples in btb_mgsc.test.cc,
btb_tage.test.cc, ras_test.cc) and change them to getPredictionMeta(0) or
getPredictionMeta(tid) as appropriate to preserve correct SMT behavior.

Comment on lines 459 to 467
if (!ftq.hasTarget(target_id, tid)) {
assert(!ftq.empty(tid));
DPRINTF(DecoupleBP, "The squashing target is insane, ignore squash on it");
DPRINTF(DecoupleBP,
"Ignore squash for tid %u on missing FTQ target %u; "
"recovering predictor state from redirect PC %#lx\n",
tid, target_id, redirect_pc);
clearPreds(tid);
threads[tid].validprediction = false;
threads[tid].s0PC = redirect_pc;
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Recover predictor history too when the FTQ target is missing.

This fallback only redirects s0PC. The per-thread speculative history (historyManagers[tid] and the folded histories derived from it) is left on the wrong path, so the next prediction after a late squash can run with stale history and trip the debug consistency checks or bias future predictions. Please fall back to a full per-thread history reset/recovery here instead of returning after only updating the PC.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 459 - 467, When
ftq.hasTarget(target_id, tid) is false, also reset the per-thread speculative
history instead of only setting threads[tid].s0PC; call the per-thread history
reset/recovery on historyManagers[tid] (e.g. historyManagers[tid]->reset() or
resetToPC(redirect_pc)) and then refresh/recompute any folded histories derived
from it so they no longer reflect the wrong path before returning; keep the
existing clearPreds(tid) and threads[tid].validprediction = false lines but
ensure historyManagers[tid] and the folded-history state are explicitly restored
to match the redirect PC.

Comment on lines 705 to +706
void resetPC(Addr new_pc);
void resetPC(ThreadID tid, Addr new_pc);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the actual content of the file at lines 705-706
sed -n '700,710p' src/cpu/pred/btb/decoupled_bpred.hh

Repository: OpenXiangShan/GEM5

Length of output: 529


🏁 Script executed:

# Search for all resetPC declarations and definitions
rg 'void resetPC\(' src/cpu/pred/btb/

Repository: OpenXiangShan/GEM5

Length of output: 209


🏁 Script executed:

# Search for all resetPC call sites across the entire codebase
rg 'resetPC\(' --type cpp --type h -B 2 -A 2

Repository: OpenXiangShan/GEM5

Length of output: 4014


🏁 Script executed:

# Look at the context around the resetPC call in TraceFetch.cc
sed -n '1,100p' src/cpu/o3/trace/TraceFetch.cc | head -50
cat -n src/cpu/o3/trace/TraceFetch.cc | grep -B 10 -A 10 'resetPC'

Repository: OpenXiangShan/GEM5

Length of output: 2799


🏁 Script executed:

# Check if there are any other single-arg resetPC calls we might have missed
rg 'resetPC\([^,)]*\)' --type cpp --type h -B 3 -A 1

Repository: OpenXiangShan/GEM5

Length of output: 3502


Remove the ambiguous single-thread resetPC(Addr) overload or rename it to explicitly indicate all-thread semantics.

The two overloads (resetPC(Addr) and resetPC(ThreadID, Addr)) create API ambiguity: the first resets all threads, while the second resets a specific thread. This allows incorrect call sites—such as src/cpu/o3/trace/TraceFetch.cc:374—to inadvertently invoke all-threads semantics when only intending to affect a single thread's recovery path in SMT configurations. Either remove the single-arg form if it's only needed for global initialization, or rename it to resetPCAllThreads() to make its semantics explicit. Verify that any remaining single-arg calls are only in intentional global-reset paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/decoupled_bpred.hh` around lines 705 - 706, The overloaded
resetPC(Addr) is ambiguous (it resets all threads) and conflicts with
resetPC(ThreadID, Addr) (per-thread); remove the single-arg overload or rename
it to resetPCAllThreads(Addr) to make semantics explicit, then update all call
sites that intended per-thread behavior to call resetPC(ThreadID, Addr) and only
keep/allow the single-arg form for true global-reset/init paths; search for and
verify every use of resetPC(Addr) and change them to either
resetPCAllThreads(new_pc) or resetPC(tid, new_pc) as appropriate.

std::vector<FullBTBPrediction> &stagePreds) override;

std::shared_ptr<void> getPredictionMeta() override;
std::shared_ptr<void> getPredictionMeta(ThreadID tid = 0) override;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the relevant file
cat -n src/cpu/pred/btb/microtage.hh | sed -n '120,135p'

Repository: OpenXiangShan/GEM5

Length of output: 1042


🏁 Script executed:

# Also check the other mentioned lines
cat -n src/cpu/pred/btb/microtage.hh | sed -n '352,368p'

Repository: OpenXiangShan/GEM5

Length of output: 1107


🏁 Script executed:

# Search for all getPredictionMeta calls in the codebase
rg "getPredictionMeta\(" -A 2 -B 2

Repository: OpenXiangShan/GEM5

Length of output: 24057


🏁 Script executed:

# Search for all generateSinglePrediction calls in the codebase
rg "generateSinglePrediction\(" -A 2 -B 2

Repository: OpenXiangShan/GEM5

Length of output: 6089


Remove thread-0 defaults from getPredictionMeta() and generateSinglePrediction() signatures.

The tid = 0 defaults allow old call sites to implicitly use thread 0 without explicit thread specification, creating a risk for SMT systems where predictor state updates may silently affect the wrong thread. Removing these defaults will require explicit thread IDs at all call sites, serving as a safety guardrail. Note that this change will affect test files and documentation examples that currently omit the tid parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/microtage.hh` at line 127, Remove the implicit ThreadID
default from the method signatures so callers must pass an explicit thread id:
update the declaration of getPredictionMeta(ThreadID tid = 0) to
getPredictionMeta(ThreadID tid) and do the same for
generateSinglePrediction(...) (remove any = 0 on ThreadID parameters). Then
search for all call sites and tests that relied on the default and update them
to supply the correct ThreadID explicitly (including unit tests and
documentation examples) to avoid accidental use of thread-0 state; ensure the
updated signatures are reflected in any overridden implementations and
declarations (matching function names getPredictionMeta and
generateSinglePrediction).

Comment on lines +88 to +90
BTBuRAS::getPredictionMeta(ThreadID tid)
{
(void)tid;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't ignore tid here.

This API became thread-aware, but the underlying uRAS state is still global. With shared specStack/nonSpecStack/meta, returns from one thread will poison another thread's RAS history and prediction metadata under SMT.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/uras.cc` around lines 88 - 90, BTBuRAS::getPredictionMeta
currently ignores the ThreadID tid which causes shared uRAS state (specStack,
nonSpecStack, meta) to leak/poison across SMT threads; update getPredictionMeta
(and related state in BTBuRAS) to index or select the per-thread RAS state using
tid (e.g., maintain/lookup per-thread specStack/nonSpecStack/meta containers or
make those structures thread-local) and return prediction metadata from the
tid-specific stacks/meta instead of the global ones so each thread's return
history and prediction metadata remain isolated.

Comment on lines +522 to +529
BaseSimpleCPU::readGem5Regs(ThreadID tid)
{
auto diffAllStates = this->diffAllStates[tid];
for (int i = 0; i < 32; i++) {
diffAllStates->gem5RegFile[i] =
threadContexts[curThread]->getReg(RegId(IntRegClass, i));
threadContexts[tid]->getReg(RegId(IntRegClass, i));
diffAllStates->gem5RegFile[i + 32] =
threadContexts[curThread]->getReg(RegId(FloatRegClass, i));
threadContexts[tid]->getReg(RegId(FloatRegClass, i));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add defensive checks for tid and per-thread state pointers.

Line 524 and Line 527 index per-thread containers directly. A bad tid (or uninitialized per-thread difftest state) will crash without context, which is painful in SMT debugging.

Proposed defensive fix
 void
 BaseSimpleCPU::readGem5Regs(ThreadID tid)
 {
+    assert(tid < threadContexts.size());
+    assert(tid < diffAllStates.size());
     auto diffAllStates = this->diffAllStates[tid];
+    assert(diffAllStates);
     for (int i = 0; i < 32; i++) {
         diffAllStates->gem5RegFile[i] =
             threadContexts[tid]->getReg(RegId(IntRegClass, i));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
BaseSimpleCPU::readGem5Regs(ThreadID tid)
{
auto diffAllStates = this->diffAllStates[tid];
for (int i = 0; i < 32; i++) {
diffAllStates->gem5RegFile[i] =
threadContexts[curThread]->getReg(RegId(IntRegClass, i));
threadContexts[tid]->getReg(RegId(IntRegClass, i));
diffAllStates->gem5RegFile[i + 32] =
threadContexts[curThread]->getReg(RegId(FloatRegClass, i));
threadContexts[tid]->getReg(RegId(FloatRegClass, i));
BaseSimpleCPU::readGem5Regs(ThreadID tid)
{
assert(tid < threadContexts.size());
assert(tid < diffAllStates.size());
auto diffAllStates = this->diffAllStates[tid];
assert(diffAllStates);
for (int i = 0; i < 32; i++) {
diffAllStates->gem5RegFile[i] =
threadContexts[tid]->getReg(RegId(IntRegClass, i));
diffAllStates->gem5RegFile[i + 32] =
threadContexts[tid]->getReg(RegId(FloatRegClass, i));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/simple/base.cc` around lines 522 - 529, BaseSimpleCPU::readGem5Regs
currently indexes diffAllStates[tid] and threadContexts[tid] directly which can
crash if tid is out of range or the per-thread pointers are null; add defensive
checks at the start of readGem5Regs to verify tid is within diffAllStates and
threadContexts bounds and that diffAllStates[tid] and threadContexts[tid] are
non-null, and if any check fails return early (or log/raise a descriptive error)
instead of dereferencing; update references to diffAllStates and threadContexts
in the function body to assume these validated pointers are safe.

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2566 📉 -0.0099 (-0.44%)

✅ Difftest smoke test passed!

Change-Id: I7690e69545b01ca4a8ba3e751f6cab7665f8767e
@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2566 📉 -0.0099 (-0.44%)

✅ Difftest smoke test passed!

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.1840 📉 -0.0825 (-3.64%)

✅ Difftest smoke test passed!

@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2350 📉 -0.0315 (-1.39%)

✅ Difftest smoke test passed!

@tastynoob tastynoob force-pushed the kmh-smt branch 2 times, most recently from 8a5e41e to 1391a92 Compare March 27, 2026 10:31
Change-Id: Ifbeb947f5f27ebdc9dc39dcfe0172eaa308f8e6f
@github-actions
Copy link
Copy Markdown

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2560 📉 -0.0105 (-0.46%)

✅ Difftest smoke test passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants