Skip to content

[PPC64LE_DYNAREC] Add more opcodes#3720

Open
runlevel5 wants to merge 1 commit intoptitSeb:mainfrom
runlevel5:ppc64le-more-opcodes
Open

[PPC64LE_DYNAREC] Add more opcodes#3720
runlevel5 wants to merge 1 commit intoptitSeb:mainfrom
runlevel5:ppc64le-more-opcodes

Conversation

@runlevel5
Copy link
Copy Markdown
Contributor

@runlevel5 runlevel5 commented Mar 27, 2026

Related #242

  • MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ)
  • TEST/NOT/NEG/MUL/IMUL/DIV/IDIV
  • INC/DEC/CALL/JMP/PUSH
  • PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD
  • Interrupts: INT 3, INT n, INTO
  • CLI/STI
  • emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32
  • helper macros: GETEDz, GETEDH, GETSED, GETDIR

@ptitSeb ptitSeb requested a review from ksco March 27, 2026 12:47
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_emit_math.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_emit_math.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_emit_math.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_emit_math.c Outdated
@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented Mar 28, 2026

I'm ok with the PR but, it would be nice to get an homogeneous way to compute SF flags, and in the fastest/smaller way possible (so the extract/insert bit way, that is 2 opcodes?)

Also, make sure Hardware Memory doesn't have an alignment constraint on PPC64LE, else, the optimise REP MOVSB will need an unligned path like on Arm64 & RV64.

Last point: if PPC64LE have flags, then you might want to look at ARM64 way of handling native flags as a match for x86 flags. It can come later of course, but it's a good source of speedup. But again, that can comes much later, as it can be complex to get stable.

Copy link
Copy Markdown
Owner

@ptitSeb ptitSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get SF computation the fastest possible and always the same method if possible.

@runlevel5 runlevel5 marked this pull request as draft March 28, 2026 22:48
@runlevel5 runlevel5 force-pushed the ppc64le-more-opcodes branch 4 times, most recently from 7922289 to 0520bc6 Compare March 29, 2026 02:56
@runlevel5
Copy link
Copy Markdown
Contributor Author

I'm ok with the PR but, it would be nice to get an homogeneous way to compute SF flags, and in the fastest/smaller way possible (so the extract/insert bit way, that is 2 opcodes?)
Also, make sure Hardware Memory doesn't have an alignment constraint on PPC64LE, else, the optimise REP MOVSB will need an unligned path like on Arm64 & RV64.

PR amended with suggested changes

Last point: if PPC64LE have flags, then you might want to look at ARM64 way of handling native flags as a match for x86 flags. It can come later of course, but it's a good source of speedup. But again, that can comes much later, as it can be complex to get stable.

It has been done in my downstream branch but I will submit them in a separate PRs.

@runlevel5 runlevel5 force-pushed the ppc64le-more-opcodes branch 2 times, most recently from 0520bc6 to ae97b0a Compare March 29, 2026 05:09
@runlevel5 runlevel5 marked this pull request as ready for review March 29, 2026 05:13
@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented Mar 29, 2026

Last point: if PPC64LE have flags, then you might want to look at ARM64 way of handling native flags as a match for x86 flags. It can come later of course, but it's a good source of speedup. But again, that can comes much later, as it can be complex to get stable.

It has been done in my downstream branch but I will submit them in a separate PRs.

It seems PPC64LE doesn't have flagless conditional jump. In that case, I'm not sure implementing the Fuse opcode mecanism as in RV64 & LA64 will give any benefit in generated code. I'm fine with it being implemented, it's just I'm unsure it will generate visible gains in performances without flagless conditionnal jump.

@runlevel5
Copy link
Copy Markdown
Contributor Author

It seems PPC64LE doesn't have flagless conditional jump. In that case, I'm not sure implementing the Fuse opcode mecanism as in RV64 & LA64 will give any benefit in generated code. I'm fine with it being implemented, it's just I'm unsure it will generate visible gains in performances without flagless conditionnal jump.

PPC64LE does not. It also does not have equivalent register to that of ARM64 NZCV register therefore I went with Fuse opcode adapted from LA64 backend. Even without flagless conditional jump, it does offer perf gain, it does offer ~3x reduction in instruction count for those common

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented Mar 30, 2026

It seems PPC64LE doesn't have flagless conditional jump. In that case, I'm not sure implementing the Fuse opcode mecanism as in RV64 & LA64 will give any benefit in generated code. I'm fine with it being implemented, it's just I'm unsure it will generate visible gains in performances without flagless conditionnal jump.

PPC64LE does not. It also does not have equivalent register to that of ARM64 NZCV register therefore I went with Fuse opcode adapted from LA64 backend. Even without flagless conditional jump, it does offer perf gain, it does offer ~3x reduction in instruction count for those common

I don't understand; if there is no flagless conditionnal, that means there is a flags register somewhere, probably handling some NZVC form of flag internaly?

Copy link
Copy Markdown
Collaborator

@ksco ksco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Some changes are requested. I didn't finish the review because this appears to be an unsupervised vibe-coding PR with prompt: "Translate the LoongArch DynaRec to PPC64LE". Per the Box64 contribution policy, I suggest a complete human review for this PR and any subsequent ones before submission.

Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
}
MARK; // Part with DF==0
LBZ(x1, 0, xRSI);
STB(x1, 0, xRDI);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of the operands of the direct macros, like LBZ/STB here, and the LDxw macros, is different. To me, this is a confusing design.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operand order difference is intentional. The low-level emitter macros (LBZ, STB, LHZ, STH, LWZ, STW, LD, STD) in ppc64le_emitter.h follow the Power ISA assembly convention: lbz RT, D(RA) — displacement before base register. This is the canonical form that anyone reading POWER assembly would expect.

The higher-level wrappers (LDxw, SDxw, LDz, SDz) in dynarec_ppc64le_helper.h use (reg, base, offset) to match the cross-platform convention shared by ARM64, LA64, and RV64.

Note that neither LA64 nor RV64 needed this wrapper layer because their ISA instruction syntax already uses (reg, base, offset) natively — the asymmetry is specific to Power ISA.

I initially considered adding byte/halfword wrappers (LDB/SDB/etc.) to unify everything to (reg, base, offset), but ultimately decided it's not worth the effort — anyone working on the PPC64LE backend is already familiar with the Power ISA operand ordering, and the convention is consistent within each layer.

x87_forget(dyn, ninst, x3, x4, 0);
sse_purge07cache(dyn, ninst, x3);
// Partially support isSimpleWrapper
tmp = isSimpleWrapper(*(wrapper_t*)(addr));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always false, so call_n will never be used. In my opinion, it would be better not to implement call_n for now.

In the call_n function, I saw:

    for (int i = 0; i < 6; i++) {
        if (sextw_mask & (1 << i)) {
            SEXT_W(A0 + i, A0 + i);
        }
    }

Which I believe is a verbatim copy from LoongArch backend. However, I checked the ABI document here: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#PARAM-PASS, I saw:

Simple integer types (char, short, int, long, enum) are mapped to a single doubleword. Values shorter than a doubleword are sign or zero extended as necessary.

Which is definitely not the same design as LoongArch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back to check whether the logic actually holds on PPC64LE. From what I can tell, isSimpleWrapper returns values where (tmp & 15) == 1 for integer-only wrappers (e.g., vFi → 17, iFii → 49), which survive the filter at line 2557. The filter only zeroes out wrappers involving floats/doubles, so call_n should still be reachable for integer-only native calls. The TODO comment reflects that FP support is not yet in place — same as LA64 and RV64.

For the sextw_mask / SEXT_W part — this was also carried over from LA64, but I believe it's still correct here. The PPC64LE ELFv2 ABI requires 32-bit integer args to be sign-extended in 64-bit registers, and x86-64 leaves the upper 32 bits undefined, so the cleanup should be necessary. RV64 uses the same approach.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to where the isSimpleWrapper is defined for PPC64LE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
FORCE_DFNONE();
RESTORE_EFLAGS(x3);
IFX (X_ZF | X_PF) LI(x6, 1);
IFX (X_OF) BF_INSERT(xFlags, xZR, F_OF, F_OF);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the xZR usage here.

// PPC64LE has no zero register; r0 acts as 0 only in specific instruction forms
// We define xZR as 0 for compatibility, but it ONLY works in D-form load/store base
#define xZR     0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xZR (defined as register 0) is indeed not a true zero register on PPC64LE — r0 only acts as literal 0 in specific instruction forms (D-form load/store base operand). However, every macro that accepts xZR has a guard:

  • BF_INSERT (_helper.h:276-284): checks if ((Rs) == xZR) and loads 0 into x4 via LI(x4, 0) before the RLDIMI, instead of using r0 directly.
  • CMPD_ZR (_helper.h:1527): checks if ((r2) == xZR) and emits CMPDI(r1, 0) instead of CMPD(r1, r2).
  • BNE_MARK3, BEQ_MARK, etc. all go through Bxx_genCMPD_ZR, which handles xZR safely.

So the generated code is correct — xZR is used as a semantic constant meaning "zero value", and the macros ensure it never reaches an instruction where r0 would be misinterpreted.

That said, I agree it can be confusing for readers. An alternative would be to replace the BF_INSERT(xFlags, xZR, ...) calls with an explicit LI + BF_INSERT pattern, or introduce a small helper like BF_CLEAR(Rd, hi, lo). Happy to hear your preference.

Comment thread src/dynarec/ppc64le/dynarec_ppc64le_00.c Outdated
@runlevel5 runlevel5 force-pushed the ppc64le-more-opcodes branch 3 times, most recently from a4e7dfe to 0421773 Compare April 1, 2026 11:49
- MOVS, CMPS, STOS, LODS, SCAS (with REP/REPNZ)
- TEST/NOT/NEG/MUL/IMUL/DIV/IDIV
- INC/DEC/CALL/JMP/PUSH
- PUSHF/POPF, SAHF/LAHF, CMC/CLC/STC/CLD/STD
- Interrupts: INT 3, INT n, INTO
- CLI/STI
- New emit helpers: emit_neg8/16/32, emit_inc8/16/32, emit_dec8/16/32
- New helper macros: GETEDz, GETEDH, GETSED, GETDIR
- Implement SIGBUS instruction emulation in sigbus_specialcases()

Other changes:

- Add isSimpleWrapper support and regenerate wrappers
- Add uFppiuup format to x64printer.c.
- Fix r2 (TOC) corruption in ppc64le_next.S
@runlevel5 runlevel5 force-pushed the ppc64le-more-opcodes branch from 0421773 to 6b98e14 Compare April 1, 2026 12:30
@ksco
Copy link
Copy Markdown
Collaborator

ksco commented Apr 8, 2026

Is this ready for another round of review? When you're ready, you should click the Re-request review button.

Note that we're closing for new code submission for a new release soon-ish (except for bug fixing).

@runlevel5 runlevel5 requested a review from ksco April 9, 2026 08:44
@psyalemao
Copy link
Copy Markdown

psyalemao commented Apr 11, 2026

Uhm, just would like to give you guys from box64 @ptitSeb @ksco some warning that this user runlevl5 is using AI to his commits, he already get caught here
dosbox-staging/dosbox-staging#4794
and here
dosbox-staging/dosbox-staging#4793
even used AI to respond to the developers of dosbox-staging.

be careful and take care.

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented Apr 11, 2026

Uhm, just would like to give you guys from box64 @ptitSeb @ksco some warning that this user runlevl5 is using AI to his commits, he already get caught here dosbox-staging/dosbox-staging#4794 and here dosbox-staging/dosbox-staging#4793 even used AI to respond to the developers of dosbox-staging.

be careful and take care.

Yes, we are aware of this, but thanks for the warning.

@ksco
Copy link
Copy Markdown
Collaborator

ksco commented Apr 11, 2026

I didn't finish the review because this appears to be an unsupervised vibe-coding PR with prompt: "Translate the LoongArch DynaRec to PPC64LE". Per the Box64 contribution policy, I suggest a complete human review for this PR and any subsequent ones before submission.

Well, my suggestions above was clearly ignored as the replies were still made by an LLM. I don't really enjoy my time talking to an LLM in this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants