string processing by pointbazaar · Pull Request #164 · pointbazaar/espl1000

pointbazaar · 2025-04-18T22:24:02Z

Summary by CodeRabbit

New Features
- Added a string equality function (streq) and a simplified print function for strings.
- Introduced an example demonstrating heap allocation and struct usage.
- Enhanced support for array and pointer arithmetic and type inference in expressions.
Bug Fixes
- Improved type checks and error handling for pointer and array dereferencing.
- Fixed logical NOT operation to perform a true logical (not bitwise) inversion.
Improvements
- Register allocation table (RAT) output can now be redirected to files or specific streams.
- Enhanced error diagnostics for type inference failures.
- Improved handling of file open flags using symbolic constants.
Documentation
- Added comments clarifying syscall error returns.
Tests
- Added test cases for string equality and struct heap allocation examples.
Chores
- Updated .gitignore to exclude register allocation table dump files.

various small fixes

When debugging nontrivial programs it is impractical to dump register allocation table to stdout/stderr. Dump it to a separate file 'function_name.rat'

Pointer variable needs to be dereferenced before adding the offset for struct member.

The previously emitted instruction sequence was incorrect.

e.g. char c = '4'; uint64 x = c - '0';

coderabbitai · 2025-04-18T22:24:10Z

Walkthrough

This set of changes introduces enhancements and bug fixes across the compiler, standard library, and supporting infrastructure. Key updates include improved type inference and well-formedness checks for array and pointer operations, expanded support for x86 instruction encoding and mnemonic mapping, and refactoring of register allocation table (RAT) printing to allow output to arbitrary file streams. The standard library sees the addition of string comparison and print utilities, as well as updates to type signatures for memory allocation and system calls. Several new example and test files demonstrate heap-allocated structs and string equality. Numerous internal functions are updated for better type safety, error reporting, and extensibility.

Changes

File(s) / Path(s)	Change Summary
.gitignore	Added pattern to ignore `.rat` files (register allocation table dumps).
ast/util/equals_ast.c	`eq_type` now returns true if either type has `is_anytype` set, not just both.
ast/visitor/visitor.c	`visit_while_stmt` now visits the condition expression before the statement block.
compiler/main/avr_code_gen/cg_avr_basic_block.c compiler/main/x86_code_gen/allocate_registers_x86.c	`rat_print` now called with explicit `stderr` stream for debug output.
compiler/main/cli/main.c	`rat_print` updated to take an explicit `stdout` stream.
compiler/main/derefll/derefll.c	Improved handling of member access on pointer types in dereference construction and annotation.
compiler/main/gen_tac/gen_tac_assignstmt.c	Added type inference for variables in member assignment; renamed buffer parameter.
compiler/main/gen_tac/helper_gen_tac_derefll.c	Improved struct type resolution and dynamic load width for dereference operations.
compiler/main/typechecker/tc_expr.c	Well-formedness check now allows array-integer operations.
compiler/main/typeinference/infer_in_context.c	Member access through pointer types now handled; improved error messages.
compiler/main/typeinference/typeinfer.c	Error reporting in `unwrap` now includes actual type string.
compiler/main/typeinference/typeinfer_deref.c	`infer_type_deref` now supports both pointer and array types.
compiler/main/typeinference/typeinfer_expr.c	Added support for array arithmetic in type inference; new function for array arithmetic handling.
compiler/main/x86_code_gen/cg_x86_single_function.c	RAT output redirected to a `.rat` file named after the function, with message to stdout.
compiler/main/x86_code_gen/compile_ir/compile_tac_unary_op.c	x86 codegen for unary NOT changed from bitwise to logical operation.
examples/stdlib/base/string/streq/test_streq.dg examples/stdlib/base/string/streq/test_streq.exitcode	New test for string equality (`streq`), with expected exit code.
examples/struct/struct_on_heap/struct_on_heap.dg examples/struct/struct_on_heap/struct_on_heap.exitcode examples/struct/struct_on_heap/struct_on_heap.stdout	New example demonstrating heap allocation of a struct, with expected output and exit code.
examples/stdlib/syscalls/write_file/write_file.dg	Replaced numeric open flags with symbolic constants (`O_RDWR
ibuffer/ibuffer_write.c ibuffer/ibuffer_x86.h ibuffer/ikey.h ibuffer/mnem.c	Added support for new x86 instructions: zero-extend moves, set-if-equal, and compare-with-constant; updated mnemonics and macros.
lexer/src/lexer.c	Tab character escape sequence (`'\t'`) now recognized in character constants.
rat/rat.c rat/rat.h rat/rat_x86.c rat/rat_x86.h	RAT printing refactored to use `FILE*` streams; function signatures updated accordingly.
stdlib/base/allocator.dg	`malloc` return type changed from pointer to slice/array; local variable updated similarly.
stdlib/base/string.dg	Added `streq` string comparison, `print` wrapper, and file descriptor enum.
stdlib/syscalls/syscalls.dg	Added error comments for `open`/`read`; `mmap` return type changed to slice/array.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Compiler
    participant RAT
    participant File

    User->>Compiler: Compile source code
    Compiler->>RAT: Generate register allocation table
    Compiler->>File: Open .rat file for writing
    Compiler->>RAT: rat_print(rat, file)
    RAT->>File: Write allocation table to file
    File-->>Compiler: Close .rat file
    Compiler->>User: Print message: RAT dumped to file

sequenceDiagram
    participant Parser
    participant TypeChecker
    participant TypeInfer
    participant CodeGen

    Parser->>TypeChecker: Check expression well-formedness
    TypeChecker->>TypeInfer: If array or pointer arithmetic, infer type
    TypeInfer->>TypeChecker: Return inferred type (supports arrays and pointers)
    TypeChecker->>CodeGen: Pass checked and inferred types
    CodeGen->>Parser: Generate code for valid expressions

Possibly related PRs

⛏️ Allocator ⛏️ #151: Modifies the eq_type function to handle the is_anytype flag, similar to this PR, but with a different logic for equality checking involving is_anytype. Both PRs update the same function with related logic concerning type equality semantics.

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)

We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6b3725 and 7c3dc8b.

📒 Files selected for processing (4)

compiler/main/typeinference/infer_in_context.c (2 hunks)
compiler/main/typeinference/typeinfer_expr.c (4 hunks)
compiler/main/x86_code_gen/cg_x86_single_function.c (1 hunks)
rat/rat.c (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

compiler/main/x86_code_gen/cg_x86_single_function.c
compiler/main/typeinference/infer_in_context.c
compiler/main/typeinference/typeinfer_expr.c
rat/rat.c

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

netlify · 2025-04-18T22:25:12Z

✅ Deploy Preview for espl1000 ready!

Name	Link
🔨 Latest commit	`7c3dc8b`
🔍 Latest deploy log	https://app.netlify.com/sites/espl1000/deploys/6804013993a71f0008e8e844
😎 Deploy Preview	https://deploy-preview-164--espl1000.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (10)

rat/rat_x86.c (1)
203-207: Updated function signature to include file output parameter

The rat_print_regname_x86 function has been updated to accept a FILE* fout parameter and use fprintf instead of printf to direct output to the specified stream. This change is part of a broader refactoring of RAT printing functions to support flexible output redirection.

Consider adding a NULL check for the fout parameter to ensure robust error handling, similar to how it's implemented in rat_print:
void rat_print_regname_x86(struct RAT* rat, FILE* fout, size_t i) {
+    if (fout == NULL) {
+        return;
+    }
    
    assert(i > SD_REG_START_X86);
    assert(i < SD_REG_END_X86);
    fprintf(fout, "%s", rat_regname_x86(i));
}
compiler/main/cli/main.c (1)
24-24: Updated rat_print call to use the new signature.

This call has been correctly updated to pass stdout as the output stream parameter, matching the new function signature in rat/rat.h.

Consider checking the return value of rat_print to handle potential errors:
-rat_print(rat, stdout);
+if (!rat_print(rat, stdout)) {
+  fprintf(stderr, "Error printing RAT\n");
+  return EXIT_FAILURE;
+}
compiler/main/x86_code_gen/cg_x86_single_function.c (1)
123-123: Add error handling for rat_print.

The rat_print function returns a boolean success indicator, but you're not checking its return value. This could result in silent failures.

Consider this improvement:
-rat_print(rat, fout_rat);
+if (!rat_print(rat, fout_rat)) {
+    fprintf(stderr, "Failed to write RAT for function '%s'\n", m->decl->name);
+    success = false;
+}
ibuffer/ibuffer_x86.h (1)
67-69: Improve comment for SETE instruction.

The current comment "set bit if equals" is slightly misleading. The SETE instruction sets the entire byte register to 1 if the zero flag is set (ZF=1), not just a single bit.
-// set bit if equals
+// set byte to 1 if equals (ZF=1)
#define sete(dest, c) ibu1(X86_SETE, dest, c)
ast/util/equals_ast.c (1)
10-12: Consider documenting the anytype behavior.

The logic for handling is_anytype is a significant aspect of the type system but lacks comments explaining its purpose or expected behavior.

Add a comment explaining the semantics of is_anytype and why either type having this flag set should result in equality:
+	// If either type is marked as anytype, they are considered equal
+	// This enables more flexible type comparisons in generics and type inference
 	if (a->is_anytype || b->is_anytype) {
 		return true;
 	}
examples/stdlib/syscalls/write_file/write_file.dg (1)

5-5: Good improvement using symbolic constants.

Replacing numeric literals (2 | 0x40) with symbolic constants (O_RDWR | O_CREAT) significantly improves code readability and maintainability. This change makes the code's intent much clearer to future readers.

Consider also using symbolic constants for the file permission mode (0x1b6) to further improve readability - perhaps using a combination of constants like S_IRUSR | S_IWUSR | S_IRGRP, etc.

examples/struct/struct_on_heap/struct_on_heap.dg (1)

18-26: Zero‑difference printout may confuse newcomers

x1 and x2 are assigned the same pointer value; printing their difference will always yield 0.
Consider demonstrating something more illustrative (e.g., pointer arithmetic over array indices) so readers see a non‑trivial result.

compiler/main/typeinference/typeinfer_expr.c (1)

25-26: Array‑arithmetic handler shadowed by pointer handler

Because infer_type_expr checks pointer_type before array_type, an expression like pointer + array will be routed through the pointer‑arithmetic path, which produces an opaque error instead of “unsupported operand types”.
You may want to detect “mixed pointer/array” early and emit a dedicated diagnostic to avoid misleading messages.

Also applies to: 53-58
compiler/main/derefll/derefll.c (2)
129-138: Double LVST lookup & potential NULL‑dereference

lvst_contains is followed immediately by lvst_get, duplicating work and opening a window for lvst_get to return NULL (e.g., if the table mutated between the two calls).
You can fold both into a single lvst_get and guard against NULL.
-	if (lvst_contains(lvst, v->simple_var->name)) {
-		...
-		struct LVSTLine* line = lvst_get(lvst, v->simple_var->name);
-		if (line->type && line->type->pointer_type && (v->member_access != NULL)) {
+	struct LVSTLine* line = lvst_get(lvst, v->simple_var->name);
+	if (line && line->type && line->type->pointer_type && v->member_access) {
 		derefll_append(res, derefll_deref());
 	}
This removes redundant traversal and hardens the code against edge cases.

211-227: Struct‑member resolution on pointer types works, but assertions are brittle

The new logic extracts underlying from a pointer and asserts it has a basic_type → simple_type → struct_type.
If we ever support pointer → typedef → struct or pointer → array → struct, these assertions may fire unexpectedly.

Consider replacing the chained asserts with a single graceful error path:
if (!underlying->basic_type ||
    !underlying->basic_type->simple_type ||
    !underlying->basic_type->simple_type->struct_type) {
        fprintf(stderr,"[derefll] invalid member access on non‑struct pointer\n");
        abort();
}
This keeps debug friendliness while avoiding undefined behaviour in release builds compiled without assert.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e97b8f and a6b3725.

📒 Files selected for processing (34)

.gitignore (1 hunks)
ast/util/equals_ast.c (1 hunks)
ast/visitor/visitor.c (1 hunks)
compiler/main/avr_code_gen/cg_avr_basic_block.c (1 hunks)
compiler/main/cli/main.c (1 hunks)
compiler/main/derefll/derefll.c (2 hunks)
compiler/main/gen_tac/gen_tac_assignstmt.c (2 hunks)
compiler/main/gen_tac/helper_gen_tac_derefll.c (2 hunks)
compiler/main/typechecker/tc_expr.c (1 hunks)
compiler/main/typeinference/infer_in_context.c (2 hunks)
compiler/main/typeinference/typeinfer.c (2 hunks)
compiler/main/typeinference/typeinfer_deref.c (1 hunks)
compiler/main/typeinference/typeinfer_expr.c (4 hunks)
compiler/main/x86_code_gen/allocate_registers_x86.c (1 hunks)
compiler/main/x86_code_gen/cg_x86_single_function.c (1 hunks)
compiler/main/x86_code_gen/compile_ir/compile_tac_unary_op.c (2 hunks)
examples/stdlib/base/string/streq/test_streq.dg (1 hunks)
examples/stdlib/base/string/streq/test_streq.exitcode (1 hunks)
examples/stdlib/syscalls/write_file/write_file.dg (1 hunks)
examples/struct/struct_on_heap/struct_on_heap.dg (1 hunks)
examples/struct/struct_on_heap/struct_on_heap.exitcode (1 hunks)
examples/struct/struct_on_heap/struct_on_heap.stdout (1 hunks)
ibuffer/ibuffer_write.c (3 hunks)
ibuffer/ibuffer_x86.h (2 hunks)
ibuffer/ikey.h (2 hunks)
ibuffer/mnem.c (2 hunks)
lexer/src/lexer.c (1 hunks)
rat/rat.c (4 hunks)
rat/rat.h (2 hunks)
rat/rat_x86.c (1 hunks)
rat/rat_x86.h (2 hunks)
stdlib/base/allocator.dg (2 hunks)
stdlib/base/string.dg (3 hunks)
stdlib/syscalls/syscalls.dg (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (12)

compiler/main/avr_code_gen/cg_avr_basic_block.c (2)

rat/rat.h (1)

rat_print (25-25)

rat/rat.c (1)

rat_print (123-137)

lexer/src/lexer.c (2)

lexer/src/driver.h (1)

out_length (10-10)

lexer/src/driver.c (1)

out_length (30-41)

compiler/main/x86_code_gen/allocate_registers_x86.c (2)

rat/rat.h (1)

rat_print (25-25)

rat/rat.c (1)

rat_print (123-137)

rat/rat_x86.c (1)

rat/rat_x86.h (2)

rat_print_regname_x86 (29-29)

rat_regname_x86 (20-20)

compiler/main/typechecker/tc_expr.c (2)

compiler/main/typechecker/util/tc_utils.h (1)

is_integer_type (10-10)

compiler/main/typechecker/util/tc_utils.c (1)

is_integer_type (20-25)

compiler/main/cli/main.c (2)

rat/rat.h (1)

rat_print (25-25)

rat/rat.c (1)

rat_print (123-137)

compiler/main/x86_code_gen/cg_x86_single_function.c (2)

rat/rat.h (1)

rat_print (25-25)

rat/rat.c (1)

rat_print (123-137)

compiler/main/typeinference/infer_in_context.c (1)

ast/util/str_ast.c (2)

str_type (286-302)

str_simple_type (210-216)

ibuffer/ibuffer_write.c (2)

rat/rat_x86.h (2)

rat_regname_x86 (20-20)

rat_regname_x86_width (27-27)

rat/rat_x86.c (2)

rat_regname_x86 (172-181)

rat_regname_x86_width (183-201)

compiler/main/gen_tac/gen_tac_assignstmt.c (6)

tac/tacbuffer.c (2)

tacbuffer_last_dest (116-120)

tacbuffer_append (19-27)

compiler/main/gen_tac/gen_tac.h (1)

tac_variable_addr (45-45)

compiler/main/gen_tac/gen_tac_variable.c (1)

tac_variable_addr (37-67)

compiler/main/typeinference/typeinfer.h (1)

infer_type_variable (31-31)

tac/tac.h (1)

ctx_tables (14-14)

tac/tac_ctor.c (1)

makeTACStore (276-285)

compiler/main/derefll/derefll.c (3)

tables/lvst/lvst.h (2)

lvst_contains (52-52)

lvst_get (39-39)

tables/lvst/lvst.c (2)

lvst_contains (187-196)

lvst_get (127-139)

compiler/main/derefll/derefll.h (2)

derefll_append (51-51)

derefll_deref (53-53)

compiler/main/typeinference/typeinfer_expr.c (4)

compiler/main/typeinference/util/type_str.h (1)

typeFromStrPrimitive (5-5)

compiler/main/typeinference/util/type_str.c (1)

typeFromStrPrimitive (75-83)

ast/util/str_ast.c (1)

str_type (286-302)

ast/util/str_ast.h (1)

str_type (32-32)

🔇 Additional comments (51)

lexer/src/lexer.c (1)

156-159: Appropriate addition of tab escape sequence support.

This change adds handling for the '\t' escape sequence in character constants, following the same pattern as the existing code for '\n'. The implementation is correct, using the out_length function to create a token with the tab character and returning the appropriate length.

compiler/main/avr_code_gen/cg_avr_basic_block.c (1)

49-49: Updated RAT printing to use explicit stderr output stream

The function call has been updated to pass stderr as the output stream to rat_print, which aligns with the refactored RAT printing interface that now requires an explicit FILE* parameter.

.gitignore (1)

17-18: Added pattern to ignore register allocation table dump files

This addition to the .gitignore file excludes .rat files from version control, which are generated during the compiler's operation as register allocation table dumps. This is good practice for excluding temporary/generated files.

compiler/main/x86_code_gen/allocate_registers_x86.c (1)

74-74: Updated RAT printing to use explicit stderr output stream

The function call has been updated to pass stderr as the output stream to rat_print, which aligns with the refactored RAT printing interface that now requires an explicit FILE* parameter.

compiler/main/x86_code_gen/compile_ir/compile_tac_unary_op.c (2)

7-7: Including ibuffer_x86.h to support new instruction generation functions.

This header file is needed for the new x86 instruction generation functions (cmp_const, sete, movzx_regs) used in the updated implementation of the unary NOT operation.

22-24: Logical NOT implementation replaced bitwise NOT.

The implementation of TAC_OP_UNARY_NOT has been changed from bitwise negation to logical negation:

cmp_const compares the source register with 0

sete sets the destination register to 1 if equal (source is 0), 0 otherwise

movzx_regs zero-extends the result to the full register width

This implementation correctly produces 1 for zero inputs and 0 for non-zero inputs, matching C's logical NOT (!) behavior.

rat/rat.h (2)

6-6: Including stdio.h for FILE type.

This header is required for the FILE pointer type used in the updated rat_print function signature.

24-25: Enhanced rat_print function now accepts an output stream and reports errors.

The function signature has been improved to:

Accept a FILE pointer parameter, allowing output to arbitrary streams instead of hardcoding to stdout

Return a boolean status to indicate success/failure

Include clear documentation that false is returned on error

This change improves flexibility and error handling.

rat/rat_x86.h (2)

6-6: Including stdio.h for FILE type.

This header is required for the FILE pointer type used in the updated rat_print_regname_x86 function signature.

29-29: Updated rat_print_regname_x86 function signature to accept an output stream.

The function now takes a FILE pointer parameter, allowing output to be directed to arbitrary streams instead of implicitly using stdout. This is consistent with the broader refactoring of RAT printing functions throughout the codebase.

ibuffer/mnem.c (3)

162-162: LGTM: Added mnemonic for new zero-extend register instruction.

The mapping for X86_MOVZX_REGS_WIDTH to "movzx" is correct and follows the x86 assembly convention.

178-178: LGTM: Added mnemonic for new compare with constant instruction.

The mapping for X86_CMP_CONST to "cmp" is correct and follows the x86 assembly convention.

180-180: LGTM: Added mnemonic for new set-if-equal instruction.

The mapping for X86_SETE to "sete" is correct and follows the x86 assembly convention.
ibuffer/ibuffer_write.c (3)

190-192: LGTM: Added formatting for zero-extend move between registers.

The implementation correctly formats the zero-extend instruction for registers with width specification.

Consider adding validation similar to other cases:
case X86_MOVZX_REGS_WIDTH:
+   assert(nbytes == 1 || nbytes == 2 || nbytes == 4);
    sprintf(s, "%s, %s", rat_regname_x86(x1), rat_regname_x86_width(x2, nbytes));
    break;
209-211: LGTM: Added formatting for set-if-equal instruction.

The formatting of the SETE instruction is correct, using the byte-sized register name.

Consider adding validation to ensure the register is valid for a byte operation:
case X86_SETE:
+   assert(nbytes == 1);
    sprintf(s, "%s", rat_regname_x86_width(x1, 1));
    break;
234-236: LGTM: Added formatting for compare with constant instruction.

The formatting is correct and consistent with similar instructions like X86_MOV_CONST.
ibuffer/ibuffer_x86.h (2)

40-40: LGTM: Added macro for zero-extend move between registers.

The macro is correctly defined using ibu3 with the appropriate instruction key and parameters.

64-64: LGTM: Added macro for compare with constant instruction.

The macro is correctly defined using ibu2 with the appropriate instruction key and parameters.

ibuffer/ikey.h (3)

184-186: X86 Instruction Keys Added Correctly

The addition of separate instruction keys for X86_MOVZX_LOAD_WIDTH and X86_MOVZX_REGS_WIDTH with distinct values properly distinguishes between these two different instruction variants.

205-205: New instruction key for constant comparison

The addition of X86_CMP_CONST improves code clarity by explicitly distinguishing comparisons with constants from regular comparisons.

208-208: SETE instruction key added

Adding the X86_SETE instruction key extends the instruction set to support the "set if equal" operation, which is a useful addition for equality comparison implementations.

rat/rat.c (3)

85-121: Function improved with better error handling

The refactored rat_dump_reg function now returns a boolean status and accepts a file stream parameter, which improves both flexibility and error handling. This allows callers to determine if the operation was successful.

The early return on line 91 indicates successful processing when skipping invalid registers, which maintains consistency with the function's return type.

123-137: Improved file stream handling with NULL check

The updated rat_print function now handles NULL file stream parameters gracefully by returning false, preventing potential segmentation faults. It's also more flexible by allowing output to any file stream.

164-164: Updated function calls to explicitly use stdout

All calls to rat_print have been correctly updated to explicitly pass stdout as the output file stream, maintaining backward compatibility while leveraging the new interface.

Also applies to: 173-173, 328-328

stdlib/base/allocator.dg (2)

19-19: Type change affects pointer arithmetic.

Changing ptr from *uint64 to [uint64] may require adjustments to how pointer arithmetic is performed on line 25 (return ptr + 16). Ensure that slice indexing behaves as expected compared to the previous pointer arithmetic.

Verify that the arithmetic operations on the slice type have the same semantics as the previous pointer-based operations, particularly when returning ptr + 16.

25-25: Verify slice/pointer compatibility.

The code returns ptr + 16, which assumes that adding an offset to a slice works the same way as with pointers. This may not be the case in all languages/implementations.

Check that your language implementation supports offset addition on slices in the same way as pointers, or modify this to use proper slice indexing if needed.
ast/util/equals_ast.c (1)

10-12:
Details

❓ Verification inconclusive

Type equality logic has been loosened.

The change makes type equality checking more permissive by returning true if either type has is_anytype set, rather than requiring both to have it set. This could potentially allow more code to pass type checking that would have been rejected before.

Verify that this change is intentional and aligns with the language's type system semantics. Are there any test cases that should now pass or fail differently because of this change?

🏁 Script executed:
#!/bin/bash
# Look for existing tests that involve anytype equality
find . -name "*.c" -o -name "*.h" | xargs grep -l "is_anytype" | xargs grep -l "eq_type"
Length of output: 110
Verify anytype equality semantics and add tests
The change makes eq_type return true if either operand has is_anytype set, widening the match compared to the previous “both‑sides” requirement. We haven’t found any existing tests covering this case, so please:

Confirm that permitting one-sided anytype equality aligns with the language’s type‑system design.

Add or update test cases exercising:

anytype == concreteType

concreteType == anytype

anytype == anytype

mismatches where neither side is anytype (existing behavior)

Update any relevant documentation or comments to reflect this looser matching rule.
examples/struct/struct_on_heap/struct_on_heap.exitcode (1)

1-1: LGTM: Exit code aligns with example program behavior.

The exit code value of 43 correctly corresponds to the expected behavior of the struct_on_heap example program, which sets a count_members field to 43 and returns it.

examples/stdlib/base/string/streq/test_streq.exitcode (1)

1-1: LGTM: Expected exit code for successful test.

The zero exit code correctly indicates successful execution of the string equality test program.

examples/struct/struct_on_heap/struct_on_heap.stdout (1)

1-2: Expected output file looks good.

This output file correctly captures the expected stdout content for the struct_on_heap example program (a single "0" character). It serves as a verification baseline for the test framework.

compiler/main/typeinference/typeinfer.c (2)

5-5: Added include for string representation of AST.

Including the str_ast.h header is necessary for the enhanced error reporting added below.

34-37: Enhanced error reporting with type information.

This improvement significantly enhances debugging by showing the actual type that's missing an array_type, rather than just a generic error message. This will help users and developers quickly identify the cause of type inference issues.

Nice attention to memory management - properly freeing the allocated string after use.

compiler/main/typechecker/tc_expr.c (1)

66-68: Improved support for array indexing expressions

The changes extend expressions to allow array types combined with integer types, which follows the same pattern as pointer types. This is a good enhancement for handling array operations consistently with pointer operations.

examples/stdlib/base/string/streq/test_streq.dg (1)

1-17: Well-structured test for string equality

The test covers both positive and negative cases for string comparison, providing different error codes for each failure scenario. This allows for precise identification of which test case failed.

compiler/main/typeinference/typeinfer_deref.c (1)

17-24: Enhanced dereferencing to support array types

Good extension of dereferencing capabilities to include array types alongside pointer types. This complements the changes in the type checker and provides a consistent approach to array and pointer operations.

compiler/main/gen_tac/gen_tac_assignstmt.c (3)

14-14: Added required header for type-checking utilities

Good addition of the header file needed for type-checking utilities.

21-21: Improved parameter naming consistency

Renamed parameter from buffer to buf for better consistency with its usage within the function.

131-146: Enhanced type information in member assignment

The function now retrieves the variable type and consistently uses the renamed parameter. This improves type safety during TAC generation for member assignments.

stdlib/base/string.dg (3)

1-5: Good addition of standardized file descriptor constants

Adding symbolic constants for standard file descriptors improves code readability and maintainability by replacing magic numbers with meaningful names. This follows best practices for systems programming.

21-39: Well-implemented string comparison function

The streq function implementation is correct and efficient:

Compares lengths first as an early optimization

Only performs character-by-character comparison if lengths match

Returns early on the first mismatch

This follows standard string comparison patterns and handles the edge cases appropriately.

95-97: Good addition of convenience print function

The print function provides a simpler interface for printing to standard output by wrapping fprint with the STDOUT constant. This follows the common pattern seen in languages like C (printf/fprintf) and provides a more convenient API.

stdlib/syscalls/syscalls.dg (3)

37-39: Good documentation of error return value

Adding documentation about the -1 error return value is a helpful addition for developers using this syscall. Clear error handling documentation is important for system-level functions.

44-46: Good documentation of error return value

Similar to the open syscall, documenting the error return value for read improves the API's usability by making error handling requirements explicit.

54-54: Improved return type for mmap syscall

Changing the return type from *uint8 to [uint8] better represents what mmap actually does - it returns a memory region that can be treated as an array of bytes. This provides better type safety and makes the API more intuitive.

compiler/main/typeinference/infer_in_context.c (4)

6-6: Good addition of stdlib.h include

Adding the stdlib.h include is necessary for the free() function used in the error handling code.

18-22: Good addition of pointer dereferencing logic

This enhancement enables member access through pointers (e.g., myStruct->field), making the type system more robust. The code correctly dereferences pointer types to access their element types before proceeding with member access.

24-29: Improved error handling for non-struct types

Replacing an assertion with proper error handling improves compiler robustness. The code now prints a helpful error message and returns NULL when a type is not a struct type.

31-35: Improved code readability with local variables

Introducing local variables for intermediate values improves code readability and maintainability by breaking down complex chains of pointer dereferences into more manageable pieces.

Also applies to: 42-49

compiler/main/gen_tac/helper_gen_tac_derefll.c (2)

62-74: Improved robustness for struct type determination

This change adds proper handling for pointer types when determining struct type names. The code now:

Checks if prev_type->basic_type exists directly

Falls back to extracting it from prev_type->pointer_type->element_type if not

Adds appropriate assertions to validate the type structure

This enhancement correctly handles both direct struct access and pointer-to-struct member access.

86-87: Improved architecture compatibility

Replacing the hardcoded width value with an architecture-dependent variable improves cross-platform compatibility. Using the pre-calculated width value ensures correct pointer size handling across different architectures.

The TODO comment is also useful, indicating that ideally the width should be determined by the specific type being dereferenced rather than a general architecture width.

compiler/main/typeinference/typeinfer_expr.c (1)

151-158: Why allow array ⬌ pointer comparisons?

if (t2->pointer_type) mirrors pointer arithmetic rules, yet comparing / computing between an array value and a raw pointer seems semantically dubious in most languages.
Double‑check whether the specification intends to support this combination; otherwise delete the block to prevent accidental acceptance.

compiler/main/x86_code_gen/cg_x86_single_function.c

rat/rat.c

stdlib/base/allocator.dg

ast/visitor/visitor.c

compiler/main/typeinference/infer_in_context.c

examples/struct/struct_on_heap/struct_on_heap.dg

compiler/main/typeinference/typeinfer_expr.c

fixes for issues found in code review

pointbazaar added 5 commits April 18, 2025 20:32

compiler: fixes for typeinference, typechecker

2132f87

various small fixes

rat: rat_print should print to file

264073a

When debugging nontrivial programs it is impractical to dump register allocation table to stdout/stderr. Dump it to a separate file 'function_name.rat'

compiler: fix struct member access through pointer

c682852

Pointer variable needs to be dereferenced before adding the offset for struct member.

compiler: x86: fix unary negation

1880038

The previously emitted instruction sequence was incorrect.

typeinference: char arithmetic has integer type

a6b3725

e.g. char c = '4'; uint64 x = c - '0';

coderabbitai bot reviewed Apr 18, 2025

View reviewed changes

compiler: fixes for PR 164

7c3dc8b

fixes for issues found in code review

pointbazaar merged commit 986067b into master Apr 19, 2025
15 checks passed

pointbazaar deleted the string-processing branch April 19, 2025 20:28

coderabbitai bot mentioned this pull request Apr 20, 2025

octal notation, -Wall -Wextra -Werror -march=native #165

Merged

coderabbitai bot mentioned this pull request May 1, 2025

typechecker + debug improvements #167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string processing#164

string processing#164
pointbazaar merged 6 commits intomasterfrom
string-processing

pointbazaar commented Apr 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 18, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

netlify bot commented Apr 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pointbazaar commented Apr 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

netlify bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for espl1000 ready!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pointbazaar commented Apr 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

netlify bot commented Apr 18, 2025 •

edited

Loading