Skip to content

Encode immediate def-use in the JIT specializer #148080

@Fidget-Spinner

Description

@Fidget-Spinner

Feature or enhancement

Proposal:

Our JIT IR naturally encodes liveness in it by virtue of being a stack IR. However, we're missing a crucial piece of info to allow for stronger optimizations: def-use.

Background: Normal SSA (static single assignment form) IR already naturally encodes def-use, and LLVM's SSA flavor also includes use-def as a property.

The idea of def-use is as the name suggests, to include information of where something is defined and where it is used (hence the name def-use).

We need def-use for the following optimizations:

  1. Eliminating redundant ops after we've constant folded combined with refcount elimination. The problem is that the current refcount elimination complicates the stack effect. So naturally the current remove_unneeded_uops pass cannot do its job properly for all of those, as the ref shuffling form of uops blocks it. This issue is becoming a bigger one as we are starting to constant fold more in the JIT as we record more info, which leaves a bunch of redundant ops lying around that we could really do away with. E.g we can constant fold most LOAD_ATTR, but we leave their operand loads lying around even when no guards need it.
  2. To feed info to the future partial evaluator (PE). In my past experiments, partial evaluation is not worth it if the unboxed operand goes straight into a boxed consumer (e.g. a C API call). This mostly wrecks the perf of PE. We need a good heuristic to determine if something is worth to unbox or whether we should keep it boxed. A reminder that the PE pass is planned to take info from the current specializer pass.

Design (def-use):

typedef union {
    uintptr_t bits;
} JitOptRef;

becomes

typedef struct {
    uintptr_t bits;
    _PyUOpInstruction *originator;
} JitOptRef;

At selected uops, we then assign orginator to be the originating instruction this symbol comes from. E.g.

[0xfefefefe] LOAD_FAST x

symbol x's originator will point to 0xfefefefe (LOAD_FAST).

Not that this encodes only the most recent def-use as a property of the reference, which should be safe. We do not need full def-use chains further back, as an interesting property of a stack IR I've observed, but not yet proven, is that local rewrites can generalize to a whole-trace optimization if done correctly. E.g. see how remove_unneeded_uops just by performing local rewrites is able to remove dead code for a larger-than-local chunk of the trace.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-JITtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions