Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
df2f733
add compiler refactoring backlog tasks (TASK-032 through TASK-036)
benswift Feb 26, 2026
e951e0b
replace compiler assoc-list caches with hash tables
benswift Feb 26, 2026
1536f10
split llvmti.xtm into separate compiler modules
benswift Feb 26, 2026
7929c18
add explicit AST accessor layer for xtlang compiler (TASK-034)
benswift Feb 26, 2026
4830759
add compiler-internal unit tests for xtlang compiler passes (TASK-035)
benswift Feb 26, 2026
d9bde7d
thread type inference vars explicitly through check functions (TASK-036)
benswift Feb 26, 2026
b5b2f2d
extend compiler unit tests with transform, typecheck, and pipeline co…
benswift Feb 27, 2026
64c0d91
refactor first-transform into sub-dispatchers with AST accessors (TAS…
benswift Feb 27, 2026
395fd85
add structured compiler error handling with shared primitives (TASK-039)
benswift Feb 27, 2026
f44bc45
compiler performance: replace regex with string ops, cache symbol con…
benswift Feb 27, 2026
4022aba
remove Testing directory from git and add to .gitignore
benswift Feb 27, 2026
1452b96
add expr_problem and extempore_lang as example tests, split IPC secti…
benswift Feb 27, 2026
889c414
add occurs check to type-unify (TASK-037.01)
benswift Feb 27, 2026
af42d74
replace vars hash table with union-find unification (TASK-037.02)
benswift Feb 27, 2026
9de6901
separate constraint generation from solving, decompose nativef-generi…
benswift Feb 28, 2026
f7295f6
formalise bidirectional checking and synthesis modes (TASK-037.04)
benswift Feb 28, 2026
dcef0cd
fix closure redefinition propagation in ORC JIT
benswift Feb 28, 2026
e11e500
close TASK-037: bidirectional type inference migration complete
benswift Feb 28, 2026
4bf05be
add closure redefinition propagation test to extempore_lang example
benswift Feb 28, 2026
d1456a4
Set CMP0168 to new for cmake 3.30+ to avoid sub-build issues on macOS
dr-offig Mar 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ libextempore.so
# cmake & other build tools
/__cmake_systeminformation
/build
/Testing
/buildlib
/cmake-build
/out # this is where MSVS puts the CMake build stuff
Expand Down
13 changes: 12 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,11 @@ message(STATUS "LLVM target architecture: ${LLVM_TARGET_ARCH}")

include(FetchContent)

# Use direct FetchContent population (CMake 3.30+) to avoid sub-build issues on macOS
if(POLICY CMP0168)
cmake_policy(SET CMP0168 NEW)
endif()

set(LLVM_TARGETS_TO_BUILD ${LLVM_TARGET_ARCH} CACHE STRING "" FORCE)
set(LLVM_ENABLE_TERMINFO OFF CACHE BOOL "" FORCE)
set(LLVM_ENABLE_ZLIB OFF CACHE BOOL "" FORCE)
Expand Down Expand Up @@ -202,7 +207,13 @@ if(EXT_DYLIB)
runtime/init.ll
runtime/init.xtm
runtime/llvmir.xtm
runtime/llvmti.xtm
runtime/llvmti-globals.xtm
runtime/llvmti-caches.xtm
runtime/llvmti-aot.xtm
runtime/llvmti-transforms.xtm
runtime/llvmti-ast.xtm
runtime/llvmti-typecheck.xtm
runtime/llvmti-bind.xtm
runtime/scheme.xtm)
add_library(extempore SHARED ${EXTEMPORE_SOURCES})
target_link_libraries(extempore PRIVATE rc_xtm)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
id: TASK-032
title: Replace compiler assoc-list caches with hash tables
status: Done
assignee: []
created_date: '2026-02-26 09:44'
updated_date: '2026-02-26 10:10'
labels:
- compiler
- performance
dependencies: []
priority: high
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
The nine xtlang compiler caches in runtime/llvmti.xtm (closure-cache, nativefunc-cache, polyfunc-cache, etc.) all use association lists with assoc-strcmp lookup, giving O(n) per lookup. Replace them with Extempore's built-in hash tables (make-hashtable, hashtable-ref, hashtable-set\!) for O(1) lookup. This is the lowest-risk, highest-impact performance improvement.
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 All nine caches in llvmti.xtm use hash tables instead of assoc lists
- [x] #2 Cache API functions (register-new-*, *-exists?, get-*-type, set-*-type) updated to use hash table operations
- [x] #3 reset-*-cache and print-*-cache functions work correctly with hash tables
- [x] #4 Core library tests pass (ctest -L libs-core)
- [x] #5 AOT compilation works (build aot_external_audio target)
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
Implemented C FFI hash table primitives in src/ffi/hashtable.inc (DJB2 hashing, Scheme vector buckets for GC safety). Converted 9 of 11 compiler caches from assoc-lists to hash tables. Left genericfunc-cache and generictype-cache as alists (multimap semantics, symbol keys, complex iteration patterns). All 6 core tests pass, AOT compilation succeeds for both core and external audio.
<!-- SECTION:NOTES:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
id: TASK-033
title: Split llvmti.xtm into separate compiler modules
status: Done
assignee:
- '@ben'
created_date: '2026-02-26 09:44'
updated_date: '2026-02-26 10:10'
labels:
- compiler
- architecture
dependencies:
- TASK-032
priority: medium
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
runtime/llvmti.xtm is 12,517 lines containing caches, transforms, type inference, closure conversion, generics, AOT support, and bind-form macros. Split it along natural boundaries into separate files: caches.xtm (~2000 lines), transforms.xtm (~300 lines), type-inference.xtm (~4600 lines), closure-convert.xtm (~600 lines), generics.xtm (~1500 lines), aot.xtm (~500 lines), bind-forms.xtm (~2000 lines). Dependencies are mostly linear: caches → transforms → type-inference → closure-convert → IR generation → bind-forms.
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 llvmti.xtm split into at least 4 separate files along phase boundaries
- [x] #2 Load order defined and documented in a top-level loader or scheme.xtm
- [x] #3 No change in compiler behaviour (core tests pass)
- [x] #4 AOT compilation works (build aot_external_audio target)
<!-- AC:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
id: TASK-034
title: Define explicit AST representation for xtlang compiler
status: Done
assignee:
- '@ben'
created_date: '2026-02-26 09:44'
updated_date: '2026-02-27 07:00'
labels:
- compiler
- architecture
dependencies:
- TASK-033
priority: medium
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
The xtlang compiler operates on raw s-expressions with car/cdr pattern matching --- there is no explicit AST type. Introduce a tagged AST representation (e.g. vectors with tag fields or tagged lists) with accessor functions. This gives each AST-consuming function an explicit contract and enables validation between passes. Start with the output of first-transform and input to type-check, since that is the most important boundary.
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 AST node types defined with constructors and accessors (at minimum: let, lambda, if, call, var, lit, set!)
- [x] #2 first-transform produces the new AST representation
- [x] #3 type-check consumes the new AST representation
- [x] #4 AST validator function exists and runs between passes in debug mode
- [x] #5 Core library tests pass
<!-- AC:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
id: TASK-035
title: Add compiler-internal unit tests for xtlang compiler passes
status: Done
assignee: []
created_date: '2026-02-26 09:44'
updated_date: '2026-02-26 09:44'
labels:
- compiler
- testing
dependencies:
- TASK-033
priority: medium
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Currently all compiler tests are end-to-end (.xtm files that compile and run). Add unit tests for individual compiler passes: first-transform desugaring, type unification, type-check on small expressions, and IR generation for individual constructs. These tests provide a safety net for subsequent refactoring (especially the vars-threading change) and document expected compiler behaviour.
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 Test file exists for first-transform with at least 10 desugaring cases (and/or/cond, println, n-ary operators, dot notation)
- [x] #2 Test file exists for type unification with at least 8 cases (simple types, closures, tuples, pointers, failure cases)
- [x] #3 Test file exists for type-check on small expressions (literals, let, lambda, if, arithmetic)
- [x] #4 Tests runnable via ctest with a new label (e.g. compiler-unit)
- [x] #5 All new tests pass
<!-- AC:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
id: TASK-036
title: Thread type inference vars explicitly through compiler passes
status: Done
assignee: []
created_date: '2026-02-26 09:44'
updated_date: '2026-02-27 00:59'
labels:
- compiler
- architecture
dependencies:
- TASK-035
priority: medium
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Currently impc:ti:type-check and all *-check functions mutate a shared vars assoc list via set-cdr!. Change to threading vars explicitly through the pipeline --- either returning updated vars from each function or using a clear mutation protocol with an explicit state object. This removes hidden shared mutable state, makes data flow visible, enables future parallelism, and makes it possible to snapshot/rollback type state for speculative typing of overloaded functions. This is the biggest refactoring and should only be attempted after the compiler has unit tests as a safety net.
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 Type inference vars are passed explicitly (not accessed via shared mutable global)
- [x] #2 All *-check functions receive and return (or explicitly mutate) the vars structure
- [ ] #3 impc:ti:run-type-check threads vars through rather than relying on side effects
- [x] #4 Core library tests pass
- [ ] #5 AOT compilation works
- [x] #6 Compiler-internal unit tests pass
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
Implemented across 6 phases: added tc-result vector return type (#(type vars)), functional vars helpers (vars-update, vars-force, vars-add, vars-snapshot, vars-clear, tc-unwrap), converted all ~35 *-check functions to return tc-result, added dispatcher compatibility shim, replaced all set-cdr! mutations in check functions with functional wrappers, updated 3 call sites in llvmti-transforms.xtm, removed dead code (clean-fvars). The old update-var/force-var still back the functional wrappers during transition. run-type-check* retry logic still uses clear-all-vars for in-place clearing. All compiler-unit (3/3) and libs-core (6/6) tests pass.
<!-- SECTION:NOTES:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
id: TASK-037
title: >-
Migrate xtlang type inference to bidirectional local type inference with
union-find
status: Done
assignee: []
created_date: '2026-02-27 21:43'
updated_date: '2026-02-28 06:37'
labels:
- compiler
- type-inference
dependencies: []
priority: high
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Replace the current ad-hoc iterative constraint propagation algorithm in the xtlang compiler with a principled bidirectional local type inference algorithm (Pierce & Turner 2000) using union-find unification.

The current algorithm (in runtime/llvmti-typecheck.xtm and runtime/llvmti-transforms.xtm) has these problems:
- Iterative retry loop with no formal convergence guarantee (run-type-check* walks the AST 1-N times)
- No occurs check (infinite types not detected)
- O(n²) list dedup per variable per pass in vars-update
- Full hash table copy (vars-snapshot) for every generic function check
- ~400-line nativef-generics function that's hard to reason about
- Pervasive mutation of the vars hash table during the AST walk

Migration is done in 4 incremental stages (subtasks), each independently testable. Must not change language semantics (monomorphic-by-default, no let-polymorphism). Must preserve bind-poly/bind-func overloading, !bang generics, and type inference / IR generation separation. Numeric coercion defaulting rules must be replicated exactly.

Key files: runtime/llvmti-typecheck.xtm, runtime/llvmti-transforms.xtm, runtime/llvmti-bind.xtm, runtime/llvmti-caches.xtm, runtime/llvmti-globals.xtm, runtime/llvmti-ast.xtm

References: Pierce & Turner (Local Type Inference, 2000), Dunfield & Krishnaswami (Bidirectional Typing, 2021), Conchon & Filliâtre (A Persistent Union-Find Data Structure, 2007)
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 All 4 stages completed as subtasks
- [x] #2 All existing tests pass (ctest -L libs-core, libs-external, examples)
- [x] #3 No change to language semantics
- [x] #4 Compiler performance equal or better than current
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
All 4 subtasks completed: occurs check (037.01), union-find (037.02), constraint store (037.03), bidirectional modes (037.04). All tests pass. Remaining potential improvements (retry loop elimination, subsume wiring, scope chain) tracked implicitly.
<!-- SECTION:NOTES:END -->
40 changes: 40 additions & 0 deletions backlog/tasks/task-037.01 - Add-occurs-check-to-type-unify.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
id: TASK-037.01
title: Add occurs check to type-unify
status: Done
assignee: []
created_date: '2026-02-27 21:43'
updated_date: '2026-02-28 05:42'
labels:
- compiler
- type-inference
dependencies: []
parent_task_id: TASK-37
priority: high
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Add an occurs check to the existing impc:ti:type-unify function in runtime/llvmti-transforms.xtm. This prevents infinite types from being constructed during unification --- currently nothing detects when a type variable appears in its own solution.

Implementation: during type-unify, before resolving a symbol by looking it up in vars, check whether the symbol being resolved appears anywhere in the type being constructed. If it does, signal a type error rather than looping.

This is a small, high-value change that requires no architectural changes to the existing algorithm. It prepares the ground for union-find (stage 2) where occurs check is a standard component.

Key file: runtime/llvmti-transforms.xtm (impc:ti:type-unify, ~line 1809)
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [ ] #1 occurs check detects self-referential type variables during unification
- [ ] #2 type error is raised when an infinite type is detected
- [ ] #3 all existing tests pass unchanged (ctest -L libs-core, libs-external)
- [ ] #4 no change to inference results for well-typed programs
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
Implemented occurs check in type-unify. See commit 889c414d.
<!-- SECTION:NOTES:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
id: TASK-037.02
title: Replace vars hash table with union-find unification
status: Done
assignee: []
created_date: '2026-02-27 21:43'
updated_date: '2026-02-28 05:42'
labels:
- compiler
- type-inference
dependencies: []
parent_task_id: TASK-37
priority: high
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Replace the mutable vars hash table (mapping symbols to lists of candidate types) with a union-find data structure. Each type variable becomes a union-find cell. The current impc:ti:vars-update (which appends to a list and deduplicates) becomes a union! operation. The current impc:ti:type-unify (which traverses lists of candidates) becomes find + path compression.

This eliminates the iterative retry loop in run-type-check* (runtime/llvmti-typecheck.xtm, ~line 3479). Currently the algorithm walks the entire AST 1-N times, retrying until types stabilise. With union-find, a single pass suffices because unification eagerly merges equivalence classes.

Implementation steps:
1. Implement union-find in Scheme (make-uf-cell, find!, union!, snapshot)
2. Replace vars hash table creation (impc:ti:find-all-vars) with union-find cell allocation
3. Replace vars-update calls with union! calls
4. Replace vars-snapshot (full hash table copy) with union-find snapshot (for generic function checking)
5. Replace the retry loop in run-type-check* with a single-pass walk
6. Update impc:ti:unify (the final pass) to read from union-find cells

Key files: runtime/llvmti-typecheck.xtm (run-type-check*, vars-update, vars-snapshot), runtime/llvmti-transforms.xtm (type-unify, unify)
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [ ] #1 union-find data structure implemented with find!, union!, and path compression
- [ ] #2 vars hash table replaced with union-find cells throughout type-check dispatch
- [ ] #3 iterative retry loop in run-type-check* eliminated (single AST pass)
- [ ] #4 vars-snapshot for generic checking uses efficient union-find snapshot
- [ ] #5 all existing tests pass (ctest -L libs-core, libs-external, examples)
- [ ] #6 compiler performance equal or better on representative programs
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
Replaced vars hash table with union-find unification. See commit af42d746.
<!-- SECTION:NOTES:END -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
id: TASK-037.03
title: Separate constraint generation from solving
status: Done
assignee: []
created_date: '2026-02-27 21:43'
updated_date: '2026-02-28 04:19'
labels:
- compiler
- type-inference
dependencies: []
parent_task_id: TASK-37
priority: medium
---

## Description

<!-- SECTION:DESCRIPTION:BEGIN -->
Extract constraint emission from the type-check dispatch (impc:ti:type-check, ~50 branches in runtime/llvmti-typecheck.xtm) into an explicit constraint data structure, then solve constraints in a separate pass.

Currently, type-check both generates constraints and solves them (via mutation of the union-find / vars structure) in a single interleaved walk. Separating these concerns makes the algorithm easier to reason about, debug, and extend.

Implementation steps:
1. Define a constraint representation: equality constraints (α = τ), overload constraints (x ∈ {f1, f2, ...} given arg types), and coercion constraints (numeric defaulting)
2. Modify type-check dispatch to emit constraints into a list/queue instead of calling union! directly
3. Implement a constraint solver that processes the constraint list:
- Equality constraints: union! on union-find cells
- Overload constraints: match against poly/adhoc caches, emit further equality constraints
- Coercion constraints: apply numeric defaulting rules (replicate current (apply min res) behaviour exactly)
4. Decompose nativef-generics (~400 lines) into constraint emission (small) + the existing specialisation machinery
5. Ensure the solver handles constraint ordering correctly (some constraints depend on others being solved first)

Key files: runtime/llvmti-typecheck.xtm (type-check dispatch, nativef-generics), runtime/llvmti-transforms.xtm (type-unify, unify), runtime/llvmti-bind.xtm (pipeline orchestration)
<!-- SECTION:DESCRIPTION:END -->

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 explicit constraint data structure defined (equality, overload, coercion)
- [x] #2 type-check dispatch emits constraints instead of solving inline
- [x] #3 separate constraint solver processes all constraints
- [x] #4 nativef-generics decomposed into constraint emitter + specialisation
- [x] #5 numeric coercion defaulting produces identical results to current algorithm
- [x] #6 all existing tests pass (ctest -L libs-core, libs-external, examples)
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
Implemented dual-write constraint store (emit + solve eagerly) in runtime/llvmti-typecheck.xtm. Constraint types: eq, force, union stored as 3-element vectors. Emission points in update-var and force-var. Replay solver (impc:ti:solve-constraints) processes constraint log. Decomposed nativef-generics into 4 focused functions: early-exit, inject-missing-vars, check-constraint, emit-final. Added 8 unit tests in tests/compiler/constraints.xtm (all pass). All existing tests produce identical results to parent commit (2 pre-existing failures in adt.xtm/generics.xtm unrelated to this change).
<!-- SECTION:NOTES:END -->
Loading
Loading