-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
Encode immediate def-use in the JIT specializer #148080
Description
Feature or enhancement
Proposal:
Our JIT IR naturally encodes liveness in it by virtue of being a stack IR. However, we're missing a crucial piece of info to allow for stronger optimizations: def-use.
Background: Normal SSA (static single assignment form) IR already naturally encodes def-use, and LLVM's SSA flavor also includes use-def as a property.
The idea of def-use is as the name suggests, to include information of where something is defined and where it is used (hence the name def-use).
We need def-use for the following optimizations:
- Eliminating redundant ops after we've constant folded combined with refcount elimination. The problem is that the current refcount elimination complicates the stack effect. So naturally the current
remove_unneeded_uopspass cannot do its job properly for all of those, as the ref shuffling form of uops blocks it. This issue is becoming a bigger one as we are starting to constant fold more in the JIT as we record more info, which leaves a bunch of redundant ops lying around that we could really do away with. E.g we can constant fold most LOAD_ATTR, but we leave their operand loads lying around even when no guards need it. - To feed info to the future partial evaluator (PE). In my past experiments, partial evaluation is not worth it if the unboxed operand goes straight into a boxed consumer (e.g. a C API call). This mostly wrecks the perf of PE. We need a good heuristic to determine if something is worth to unbox or whether we should keep it boxed. A reminder that the PE pass is planned to take info from the current specializer pass.
Design (def-use):
typedef union {
uintptr_t bits;
} JitOptRef;becomes
typedef struct {
uintptr_t bits;
_PyUOpInstruction *originator;
} JitOptRef;At selected uops, we then assign orginator to be the originating instruction this symbol comes from. E.g.
[0xfefefefe] LOAD_FAST xsymbol x's originator will point to 0xfefefefe (LOAD_FAST).
Not that this encodes only the most recent def-use as a property of the reference, which should be safe. We do not need full def-use chains further back, as an interesting property of a stack IR I've observed, but not yet proven, is that local rewrites can generalize to a whole-trace optimization if done correctly. E.g. see how remove_unneeded_uops just by performing local rewrites is able to remove dead code for a larger-than-local chunk of the trace.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response