Conversation
|
I'm ok with the PR but, it would be nice to get an homogeneous way to compute SF flags, and in the fastest/smaller way possible (so the extract/insert bit way, that is 2 opcodes?) Also, make sure Hardware Memory doesn't have an alignment constraint on PPC64LE, else, the optimise REP MOVSB will need an unligned path like on Arm64 & RV64. Last point: if PPC64LE have flags, then you might want to look at ARM64 way of handling native flags as a match for x86 flags. It can come later of course, but it's a good source of speedup. But again, that can comes much later, as it can be complex to get stable. |
ptitSeb
left a comment
There was a problem hiding this comment.
Please get SF computation the fastest possible and always the same method if possible.
7922289 to
0520bc6
Compare
PR amended with suggested changes
It has been done in my downstream branch but I will submit them in a separate PRs. |
0520bc6 to
ae97b0a
Compare
It seems PPC64LE doesn't have flagless conditional jump. In that case, I'm not sure implementing the Fuse opcode mecanism as in RV64 & LA64 will give any benefit in generated code. I'm fine with it being implemented, it's just I'm unsure it will generate visible gains in performances without flagless conditionnal jump. |
PPC64LE does not. It also does not have equivalent register to that of ARM64 NZCV register therefore I went with Fuse opcode adapted from LA64 backend. Even without flagless conditional jump, it does offer perf gain, it does offer ~3x reduction in instruction count for those common |
I don't understand; if there is no flagless conditionnal, that means there is a flags register somewhere, probably handling some NZVC form of flag internaly? |
ksco
left a comment
There was a problem hiding this comment.
Thank you for the PR! Some changes are requested. I didn't finish the review because this appears to be an unsupervised vibe-coding PR with prompt: "Translate the LoongArch DynaRec to PPC64LE". Per the Box64 contribution policy, I suggest a complete human review for this PR and any subsequent ones before submission.
| } | ||
| MARK; // Part with DF==0 | ||
| LBZ(x1, 0, xRSI); | ||
| STB(x1, 0, xRDI); |
There was a problem hiding this comment.
The order of the operands of the direct macros, like LBZ/STB here, and the LDxw macros, is different. To me, this is a confusing design.
There was a problem hiding this comment.
The operand order difference is intentional. The low-level emitter macros (LBZ, STB, LHZ, STH, LWZ, STW, LD, STD) in ppc64le_emitter.h follow the Power ISA assembly convention: lbz RT, D(RA) — displacement before base register. This is the canonical form that anyone reading POWER assembly would expect.
The higher-level wrappers (LDxw, SDxw, LDz, SDz) in dynarec_ppc64le_helper.h use (reg, base, offset) to match the cross-platform convention shared by ARM64, LA64, and RV64.
Note that neither LA64 nor RV64 needed this wrapper layer because their ISA instruction syntax already uses (reg, base, offset) natively — the asymmetry is specific to Power ISA.
I initially considered adding byte/halfword wrappers (LDB/SDB/etc.) to unify everything to (reg, base, offset), but ultimately decided it's not worth the effort — anyone working on the PPC64LE backend is already familiar with the Power ISA operand ordering, and the convention is consistent within each layer.
| x87_forget(dyn, ninst, x3, x4, 0); | ||
| sse_purge07cache(dyn, ninst, x3); | ||
| // Partially support isSimpleWrapper | ||
| tmp = isSimpleWrapper(*(wrapper_t*)(addr)); |
There was a problem hiding this comment.
This is always false, so call_n will never be used. In my opinion, it would be better not to implement call_n for now.
In the call_n function, I saw:
for (int i = 0; i < 6; i++) {
if (sextw_mask & (1 << i)) {
SEXT_W(A0 + i, A0 + i);
}
}Which I believe is a verbatim copy from LoongArch backend. However, I checked the ABI document here: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#PARAM-PASS, I saw:
Simple integer types (char, short, int, long, enum) are mapped to a single doubleword. Values shorter than a doubleword are sign or zero extended as necessary.
Which is definitely not the same design as LoongArch.
There was a problem hiding this comment.
I went back to check whether the logic actually holds on PPC64LE. From what I can tell, isSimpleWrapper returns values where (tmp & 15) == 1 for integer-only wrappers (e.g., vFi → 17, iFii → 49), which survive the filter at line 2557. The filter only zeroes out wrappers involving floats/doubles, so call_n should still be reachable for integer-only native calls. The TODO comment reflects that FP support is not yet in place — same as LA64 and RV64.
For the sextw_mask / SEXT_W part — this was also carried over from LA64, but I believe it's still correct here. The PPC64LE ELFv2 ABI requires 32-bit integer args to be sign-extended in 64-bit registers, and x86-64 leaves the upper 32 bits undefined, so the cleanup should be necessary. RV64 uses the same approach.
There was a problem hiding this comment.
Can you point me to where the isSimpleWrapper is defined for PPC64LE?
There was a problem hiding this comment.
@ksco my bad, I did not push the changes. You could find it at https://github.com/ptitSeb/box64/pull/3720/changes#diff-d5711d13261a7ce86fc0158fdfc54bf1c515953b57790d2fa0852a431eb67805R15599
| FORCE_DFNONE(); | ||
| RESTORE_EFLAGS(x3); | ||
| IFX (X_ZF | X_PF) LI(x6, 1); | ||
| IFX (X_OF) BF_INSERT(xFlags, xZR, F_OF, F_OF); |
There was a problem hiding this comment.
I'm confused by the xZR usage here.
// PPC64LE has no zero register; r0 acts as 0 only in specific instruction forms // We define xZR as 0 for compatibility, but it ONLY works in D-form load/store base #define xZR 0
There was a problem hiding this comment.
xZR (defined as register 0) is indeed not a true zero register on PPC64LE — r0 only acts as literal 0 in specific instruction forms (D-form load/store base operand). However, every macro that accepts xZR has a guard:
BF_INSERT(_helper.h:276-284): checksif ((Rs) == xZR)and loads 0 intox4viaLI(x4, 0)before theRLDIMI, instead of usingr0directly.CMPD_ZR(_helper.h:1527): checksif ((r2) == xZR)and emitsCMPDI(r1, 0)instead ofCMPD(r1, r2).BNE_MARK3,BEQ_MARK, etc. all go throughBxx_gen→CMPD_ZR, which handlesxZRsafely.
So the generated code is correct — xZR is used as a semantic constant meaning "zero value", and the macros ensure it never reaches an instruction where r0 would be misinterpreted.
That said, I agree it can be confusing for readers. An alternative would be to replace the BF_INSERT(xFlags, xZR, ...) calls with an explicit LI + BF_INSERT pattern, or introduce a small helper like BF_CLEAR(Rd, hi, lo). Happy to hear your preference.
a4e7dfe to
0421773
Compare
- MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ) - TEST/NOT/NEG/MUL/IMUL/DIV/IDIV - INC/DEC/CALL/JMP/PUSH - PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD - Interrupts: INT 3, INT n, INTO - CLI/STI - New emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32 - New helper macros: GETEDz, GETEDH, GETSED, GETDIR - Implement SIGBUS instruction emulation in sigbus_specialcases() Other changes: - Add isSimpleWrapper support and regenerate wrappers - Add uFppiuup format to x64printer.c. - Fix r2 (TOC) corruption in ppc64le_next.S
0421773 to
6b98e14
Compare
|
Is this ready for another round of review? When you're ready, you should click the Re-request review button. Note that we're closing for new code submission for a new release soon-ish (except for bug fixing). |
|
Uhm, just would like to give you guys from box64 @ptitSeb @ksco some warning that this user runlevl5 is using AI to his commits, he already get caught here be careful and take care. |
Yes, we are aware of this, but thanks for the warning. |
Well, my suggestions above was clearly ignored as the replies were still made by an LLM. I don't really enjoy my time talking to an LLM in this way. |
Related #242