llvm-obfus is an out-of-tree LLVM 21+ pass plugin for policy-driven IR obfuscation.
The project applies native LLVM IR transforms to selected functions. The main production entry point is obf-safe-pipeline, which composes virtualization, structural rewrites, string and constant protection, late indirect dispatch, and final artifact cleanup.
The design goal is simple: make static recovery materially harder while staying inside normal LLVM semantics. The project does not rely on malformed objects, inline-asm traps, EH spoofing, or target-specific parser breaks.
- Protection levels are
none,light,strong,vm, andstrong_vm. vmandstrong_vmlower selected functions into VM-backed execution paths.strong_vmimplementation bodies continue through later hardening stages, not just the public wrapper.- MBA rewriting is used both directly and as part of other transforms such as constant reconstruction.
indirect_dispatchis a late pass in the safe pipeline.- It rewrites supported conditional branches and switch dispatch sites into per-site masked
blockaddressplus arithmetic plusindirectbrsequences. - Each dispatch site derives its masking material from the protected function seed and site index.
- The implementation reconstructs targets from same-function deltas in SSA instead of emitting absolute dispatch tables in globals.
- This pass does not use the authenticated BLAKE2s runtime used by strings and constant pools.
- Unsupported shapes are skipped conservatively: EH personalities, EH pads,
invoke,callbr, existingindirectbr,catchswitch,catchreturn,cleanupreturn,resume,musttail, and non-integral program address spaces.
- String encoding is configured under
string_encoding. authenticated_modeenables the keyed and integrity-checked runtime decode path.- The runtime support lives in
runtime/string_auth_runtime.cand handles keyed string and constant-pool recovery. - Lazy decode, eager decode, constructor fallback, and forwarded-pointer cases are handled in the transform.
- Constant encoding modes are
off,mba_inline,keyed_pool,auto, andall. mba_inlinereconstructs constants directly in IR.keyed_poolmoves constants into keyed, integrity-checked pools recovered at use sites.autochooses a strategy per use site.
- The top-level
seedis the root build input. Function-selective passes such asindirect_dispatchderive per-function seeds from the module name, function name, and top-level seed; the keyed string and keyed-pool runtime currently uses the top-level seed directly. authenticated_modeandkeyed_pooluse a domain-separated BLAKE2s schedule implemented ininclude/obf/support/auth_encoding.h.- The schedule is
build_key(seed)->function_key(module_id, function_id)-> per-site or per-pool key -> labeledencandmacsubkeys. - Authenticated strings derive distinct keys from descriptor metadata including
module_id, a derivedfunction_id, andsite_id. Keyed constant pools derive distinct keys frommodule_idandpool_id. - Authentication uses a keyed BLAKE2s tag over descriptor metadata plus ciphertext, and encryption uses a BLAKE2s-derived XOR keystream with a derived nonce. It does not use AES, ChaCha20, HMAC, or SipHash.
- The emitted artifacts store the 32-byte
build_keyin internal globals and reconstruct derived keys at runtime from descriptor metadata. This is an embedded-key, self-contained runtime: no hardware token, remote service, white-box key split, or entropy-anchor binding is involved. - Integrity verification is fail-closed: descriptor mismatches, tag mismatches, and length mismatches trap in the runtime instead of returning tampered plaintext.
runtime/entropy_anchor.csupports opaque arithmetic and MBA-style transforms; it is separate from the keyed string and constant-pool key schedule.
- Public runtime ABI names are generated at build time in
build/include/obf/support/runtime_abi_generated.h. - The default public prefix is
rt_core_. - Final cleanup strips marker attributes, removes annotation metadata, anonymizes local/internal obfuscation artifacts, and strips local SSA names.
- Security gates can fail the build on leaked public
obfsymbols.
- YAML loading and config parsing live in
lib/frontend/. - Profiles are
fast,standard,guarded,fortress, andlab. - Profile defaults are applied first; explicit top-level YAML sections override them;
--obf-seedoverrides the final seed after config loading.
- Per-function feature extraction lives in
lib/analysis/. - Policy selection lives in
lib/policy/. - The pipeline is function-selective rather than blanket-on for the whole module.
- Core transforms live in
lib/transforms/. - VM lowering lives in
lib/vm/. - Pass registration and safe-pipeline orchestration live in
lib/plugin/.
runtime/entropy_anchor.cprovides the entropy anchor support object used by builds and tests.runtime/string_auth_runtime.cprovides keyed and integrity-checked decode support for strings and constant pools.
obf-safe-pipeline is the integrated pipeline used by the benchmarks and lit coverage. Its current high-level order is:
- entropy initialization
- VM lowering and call rewriting for
vm - VM lowering and call rewriting for
strong_vm - post-VM string encoding
- constant encoding
- opaque GEP
- instruction substitution
- opaque predicates
- control flattening
- function outlining
- bogus control flow
- block splitting
- additional hardening on
strong_vmimplementation functions - CFG state cleanup
- indirect dispatch
- security gate enforcement
- artifact cleanup
The late ordering matters. Indirect dispatch runs after the major structural passes so it can rewrite the final dispatch-heavy CFG shapes, including VM implementation functions.
Top-level sections currently supported by the loader:
profileseeddefault_leveloverridestargetsblock_splitstring_encodingconstant_encodingmbaindirect_dispatchsecuritydebug_preserve_generated_names
overrides entries match exact function names; targets entries support glob-style wildcard patterns (e.g., "verify_*").
| Setting | fast |
standard |
guarded |
fortress |
lab |
|---|---|---|---|---|---|
mba.depth |
1 | 1 | 2 | 3 | 4 |
block_split.max_splits_per_function |
1 | 1 | 2 | 4 | 8 |
string_encoding.min_string_length |
3 | 2 | 2 | 1 | 1 |
string_encoding.max_strings_per_module |
32 | 128 | 256 | 512 | 1024 |
string_encoding.prefer_lazy_decode |
true | true | true | false | false |
string_encoding.allow_ctor_fallback |
true | true | false | false | false |
constant_encoding.max_constants_per_function |
2 | 4 | 8 | 16 | 32 |
security.fail_on_public_obf_symbol |
false | true | true | true | true |
All profiles default to authenticated_mode: false, indirect_dispatch.enabled: false, min_instructions_per_block: 2 (fortress and lab use 1), min_bit_width: 8, default_level: none, and constant_encoding.mode: mba_inline. Explicit top-level YAML keys override profile defaults.
Protection levels can be set directly in source using LLVM's annotate attribute. The annotation value must be "obf:<level>" where <level> is one of none, light, strong, vm, or strong_vm.
__attribute__((annotate("obf:strong_vm")))
void sensitive_routine(void) { ... }Annotations take precedence below explicit overrides entries but above targets rule matching. The automatic security floor applies independently and may raise the level further.
Minimal example:
profile: fortress
seed: 20260601
default_level: none
targets:
- match: "verify_*"
level: strong_vm
- match: "license_*"
level: strong_vm
string_encoding:
authenticated_mode: true
prefer_lazy_decode: true
allow_ctor_fallback: false
constant_encoding:
mode: auto
max_constants_per_function: 8
min_bit_width: 8
mba:
depth: 3
indirect_dispatch:
enabled: true
max_sites_per_function: 4
max_switch_targets: 8
target_vm_dispatchers: true
target_flattened_headers: true
security:
fail_on_public_obf_symbol: true
strip_release_markers: trueRequirements:
- CMake 3.24+
- C++23 compiler
- LLVM 21+
- Python 3
lit- LLVM tools:
opt,clang,clang++,llvm-link,llc,llvm-strip,llvm-nm,llvm-objdump - Optional:
stringsfor benchmark string audits
Configure and build:
cmake -S . -B build -DLLVM_DIR="$(llvm-config --cmakedir)"
cmake --build buildUseful cache variables:
OBF_BENCHMARK_SEEDOBF_RUNTIME_ABI_PREFIXOBF_BENCHMARK_CLEAN_IROBF_BENCHMARK_CLEANUP_PASSES
Feature report:
obf-feature-reportis read-only and emitsobf.feature_report.v3JSON with per-function policy decisions and per-transform strategy details.
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
-passes=obf-feature-report \
-disable-output input.llPolicy audit:
obf-auditprints a policy-resolution table and can also writeobf.audit.v1JSON with--obf-audit-out.
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
--obf-audit-out=audit.json \
-passes=obf-audit \
-disable-output input.llFull safe pipeline:
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
-passes=obf-safe-pipeline \
-S input.ll -o output.llIsolated indirect dispatch:
opt -load-pass-plugin build/obf_plugin.so \
--obf-config=config.yaml \
-passes=obf-indirect-dispatch \
-S input.ll -o indirect.llOther standalone passes:
- Read-only/reporting:
obf-feature-report,obf-audit. - Transform stages:
obf-entropy-init,obf-vm,obf-block-split,obf-string-encode,obf-constant-encode,obf-opaque-gep,obf-instruction-substitute,obf-control-flatten,obf-function-outline,obf-opaque-preds,obf-bogus-cf,obf-indirect-dispatch,obf-cfg-state-cleanup, andobf-artifact-cleanup.
obf-driver currently loads a config and prints a summary. It is not a full compile driver.
Benchmark targets build paired baseline and obfuscated artifacts under build/benchmarks/<name>/. The benchmark build passes --obf-seed=${OBF_EFFECTIVE_BENCHMARK_SEED} to opt, so OBF_BENCHMARK_SEED controls the effective benchmark seed for the whole build tree even when a sample benchmark config contains its own seed: entry.
Build benchmark pairs:
cmake --build build --target obf-benchmarksPer-benchmark artifacts:
<name>.baseline.ll<name>.obfuscated.ll<name>.obfuscated.cleaned.llwhenOBF_BENCHMARK_CLEAN_IR=ON<name>.baseline<name>.obfuscated
Benchmark and analysis targets:
obf-benchmarksbuilds stripped baseline and obfuscated pairs for the full corpus.obf-benchmarks-miremits MIR snapshots for linked benchmark targets such aswpo_demo.obf-audit-benchmarksaudits stripped obfuscated benchmark binaries for leaked symbols and, whenstringsis available, residual strings.obf-re-harnessscores how much VM structure is recoverable from obfuscated benchmark IR and writesbuild/re-harness/vm_recovery.json.obf-seed-diversityverifies seed-driven IR diversity and writesbuild/diversity/diversity.json.
Current benchmark corpus:
license_democonfig_demovm_workflow_demowpo_demo
Measure keyed string decode overhead:
python tools/obf-bench/measure_string_auth_overhead.py --build-dir buildThe helper writes temporary inputs under build/string-auth-bench/ and reports lazy first-decode cost, lazy steady-state helper cost, and constructor startup impact.
Requested release sweep:
cmake --build build --target obf-benchmarks obf-seed-diversity obf-unit-tests
ctest --test-dir build --output-on-failure -R "obf-lit|obf-unit-tests"include/obf/ public headers
lib/analysis/ feature extraction
lib/frontend/ config loading and annotations
lib/plugin/ pass registration and pipeline wiring
lib/policy/ function-level policy selection
lib/report/ reporting
lib/transforms/ IR transforms
lib/vm/ VM lowering and dispatch
runtime/ runtime support objects
tests/lit/ lit coverage
tests/unit/ unit tests
benchmarks/ corpus, configs, and build targets
tools/ helper tools and scripts