Skip to content

string processing#164

Merged
pointbazaar merged 6 commits intomasterfrom
string-processing
Apr 19, 2025
Merged

string processing#164
pointbazaar merged 6 commits intomasterfrom
string-processing

Conversation

@pointbazaar
Copy link
Owner

@pointbazaar pointbazaar commented Apr 18, 2025

Summary by CodeRabbit

  • New Features

    • Added a string equality function (streq) and a simplified print function for strings.
    • Introduced an example demonstrating heap allocation and struct usage.
    • Enhanced support for array and pointer arithmetic and type inference in expressions.
  • Bug Fixes

    • Improved type checks and error handling for pointer and array dereferencing.
    • Fixed logical NOT operation to perform a true logical (not bitwise) inversion.
  • Improvements

    • Register allocation table (RAT) output can now be redirected to files or specific streams.
    • Enhanced error diagnostics for type inference failures.
    • Improved handling of file open flags using symbolic constants.
  • Documentation

    • Added comments clarifying syscall error returns.
  • Tests

    • Added test cases for string equality and struct heap allocation examples.
  • Chores

    • Updated .gitignore to exclude register allocation table dump files.

When debugging nontrivial programs it is impractical to dump register
allocation table to stdout/stderr.

Dump it to a separate file 'function_name.rat'
Pointer variable needs to be dereferenced before adding the offset for
struct member.
The previously emitted instruction sequence was incorrect.
e.g.

char c = '4';
uint64 x = c - '0';
@coderabbitai
Copy link

coderabbitai bot commented Apr 18, 2025

Walkthrough

This set of changes introduces enhancements and bug fixes across the compiler, standard library, and supporting infrastructure. Key updates include improved type inference and well-formedness checks for array and pointer operations, expanded support for x86 instruction encoding and mnemonic mapping, and refactoring of register allocation table (RAT) printing to allow output to arbitrary file streams. The standard library sees the addition of string comparison and print utilities, as well as updates to type signatures for memory allocation and system calls. Several new example and test files demonstrate heap-allocated structs and string equality. Numerous internal functions are updated for better type safety, error reporting, and extensibility.

Changes

File(s) / Path(s) Change Summary
.gitignore Added pattern to ignore .rat files (register allocation table dumps).
ast/util/equals_ast.c eq_type now returns true if either type has is_anytype set, not just both.
ast/visitor/visitor.c visit_while_stmt now visits the condition expression before the statement block.
compiler/main/avr_code_gen/cg_avr_basic_block.c
compiler/main/x86_code_gen/allocate_registers_x86.c
rat_print now called with explicit stderr stream for debug output.
compiler/main/cli/main.c rat_print updated to take an explicit stdout stream.
compiler/main/derefll/derefll.c Improved handling of member access on pointer types in dereference construction and annotation.
compiler/main/gen_tac/gen_tac_assignstmt.c Added type inference for variables in member assignment; renamed buffer parameter.
compiler/main/gen_tac/helper_gen_tac_derefll.c Improved struct type resolution and dynamic load width for dereference operations.
compiler/main/typechecker/tc_expr.c Well-formedness check now allows array-integer operations.
compiler/main/typeinference/infer_in_context.c Member access through pointer types now handled; improved error messages.
compiler/main/typeinference/typeinfer.c Error reporting in unwrap now includes actual type string.
compiler/main/typeinference/typeinfer_deref.c infer_type_deref now supports both pointer and array types.
compiler/main/typeinference/typeinfer_expr.c Added support for array arithmetic in type inference; new function for array arithmetic handling.
compiler/main/x86_code_gen/cg_x86_single_function.c RAT output redirected to a .rat file named after the function, with message to stdout.
compiler/main/x86_code_gen/compile_ir/compile_tac_unary_op.c x86 codegen for unary NOT changed from bitwise to logical operation.
examples/stdlib/base/string/streq/test_streq.dg
examples/stdlib/base/string/streq/test_streq.exitcode
New test for string equality (streq), with expected exit code.
examples/struct/struct_on_heap/struct_on_heap.dg
examples/struct/struct_on_heap/struct_on_heap.exitcode
examples/struct/struct_on_heap/struct_on_heap.stdout
New example demonstrating heap allocation of a struct, with expected output and exit code.
examples/stdlib/syscalls/write_file/write_file.dg Replaced numeric open flags with symbolic constants (`O_RDWR
ibuffer/ibuffer_write.c
ibuffer/ibuffer_x86.h
ibuffer/ikey.h
ibuffer/mnem.c
Added support for new x86 instructions: zero-extend moves, set-if-equal, and compare-with-constant; updated mnemonics and macros.
lexer/src/lexer.c Tab character escape sequence ('\t') now recognized in character constants.
rat/rat.c
rat/rat.h
rat/rat_x86.c
rat/rat_x86.h
RAT printing refactored to use FILE* streams; function signatures updated accordingly.
stdlib/base/allocator.dg malloc return type changed from pointer to slice/array; local variable updated similarly.
stdlib/base/string.dg Added streq string comparison, print wrapper, and file descriptor enum.
stdlib/syscalls/syscalls.dg Added error comments for open/read; mmap return type changed to slice/array.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Compiler
    participant RAT
    participant File

    User->>Compiler: Compile source code
    Compiler->>RAT: Generate register allocation table
    Compiler->>File: Open .rat file for writing
    Compiler->>RAT: rat_print(rat, file)
    RAT->>File: Write allocation table to file
    File-->>Compiler: Close .rat file
    Compiler->>User: Print message: RAT dumped to file
Loading
sequenceDiagram
    participant Parser
    participant TypeChecker
    participant TypeInfer
    participant CodeGen

    Parser->>TypeChecker: Check expression well-formedness
    TypeChecker->>TypeInfer: If array or pointer arithmetic, infer type
    TypeInfer->>TypeChecker: Return inferred type (supports arrays and pointers)
    TypeChecker->>CodeGen: Pass checked and inferred types
    CodeGen->>Parser: Generate code for valid expressions
Loading

Possibly related PRs

  • ⛏️ Allocator ⛏️  #151: Modifies the eq_type function to handle the is_anytype flag, similar to this PR, but with a different logic for equality checking involving is_anytype. Both PRs update the same function with related logic concerning type equality semantics.

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6b3725 and 7c3dc8b.

📒 Files selected for processing (4)
  • compiler/main/typeinference/infer_in_context.c (2 hunks)
  • compiler/main/typeinference/typeinfer_expr.c (4 hunks)
  • compiler/main/x86_code_gen/cg_x86_single_function.c (1 hunks)
  • rat/rat.c (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • compiler/main/x86_code_gen/cg_x86_single_function.c
  • compiler/main/typeinference/infer_in_context.c
  • compiler/main/typeinference/typeinfer_expr.c
  • rat/rat.c

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@netlify
Copy link

netlify bot commented Apr 18, 2025

Deploy Preview for espl1000 ready!

Name Link
🔨 Latest commit 7c3dc8b
🔍 Latest deploy log https://app.netlify.com/sites/espl1000/deploys/6804013993a71f0008e8e844
😎 Deploy Preview https://deploy-preview-164--espl1000.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (10)
rat/rat_x86.c (1)

203-207: Updated function signature to include file output parameter

The rat_print_regname_x86 function has been updated to accept a FILE* fout parameter and use fprintf instead of printf to direct output to the specified stream. This change is part of a broader refactoring of RAT printing functions to support flexible output redirection.

Consider adding a NULL check for the fout parameter to ensure robust error handling, similar to how it's implemented in rat_print:

void rat_print_regname_x86(struct RAT* rat, FILE* fout, size_t i) {
+    if (fout == NULL) {
+        return;
+    }
    
    assert(i > SD_REG_START_X86);
    assert(i < SD_REG_END_X86);
    fprintf(fout, "%s", rat_regname_x86(i));
}
compiler/main/cli/main.c (1)

24-24: Updated rat_print call to use the new signature.

This call has been correctly updated to pass stdout as the output stream parameter, matching the new function signature in rat/rat.h.

Consider checking the return value of rat_print to handle potential errors:

-rat_print(rat, stdout);
+if (!rat_print(rat, stdout)) {
+  fprintf(stderr, "Error printing RAT\n");
+  return EXIT_FAILURE;
+}
compiler/main/x86_code_gen/cg_x86_single_function.c (1)

123-123: Add error handling for rat_print.

The rat_print function returns a boolean success indicator, but you're not checking its return value. This could result in silent failures.

Consider this improvement:

-rat_print(rat, fout_rat);
+if (!rat_print(rat, fout_rat)) {
+    fprintf(stderr, "Failed to write RAT for function '%s'\n", m->decl->name);
+    success = false;
+}
ibuffer/ibuffer_x86.h (1)

67-69: Improve comment for SETE instruction.

The current comment "set bit if equals" is slightly misleading. The SETE instruction sets the entire byte register to 1 if the zero flag is set (ZF=1), not just a single bit.

-// set bit if equals
+// set byte to 1 if equals (ZF=1)
#define sete(dest, c) ibu1(X86_SETE, dest, c)
ast/util/equals_ast.c (1)

10-12: Consider documenting the anytype behavior.

The logic for handling is_anytype is a significant aspect of the type system but lacks comments explaining its purpose or expected behavior.

Add a comment explaining the semantics of is_anytype and why either type having this flag set should result in equality:

+	// If either type is marked as anytype, they are considered equal
+	// This enables more flexible type comparisons in generics and type inference
 	if (a->is_anytype || b->is_anytype) {
 		return true;
 	}
examples/stdlib/syscalls/write_file/write_file.dg (1)

5-5: Good improvement using symbolic constants.

Replacing numeric literals (2 | 0x40) with symbolic constants (O_RDWR | O_CREAT) significantly improves code readability and maintainability. This change makes the code's intent much clearer to future readers.

Consider also using symbolic constants for the file permission mode (0x1b6) to further improve readability - perhaps using a combination of constants like S_IRUSR | S_IWUSR | S_IRGRP, etc.

examples/struct/struct_on_heap/struct_on_heap.dg (1)

18-26: Zero‑difference printout may confuse newcomers

x1 and x2 are assigned the same pointer value; printing their difference will always yield 0.
Consider demonstrating something more illustrative (e.g., pointer arithmetic over array indices) so readers see a non‑trivial result.

compiler/main/typeinference/typeinfer_expr.c (1)

25-26: Array‑arithmetic handler shadowed by pointer handler

Because infer_type_expr checks pointer_type before array_type, an expression like pointer + array will be routed through the pointer‑arithmetic path, which produces an opaque error instead of “unsupported operand types”.
You may want to detect “mixed pointer/array” early and emit a dedicated diagnostic to avoid misleading messages.

Also applies to: 53-58

compiler/main/derefll/derefll.c (2)

129-138: Double LVST lookup & potential NULL‑dereference

lvst_contains is followed immediately by lvst_get, duplicating work and opening a window for lvst_get to return NULL (e.g., if the table mutated between the two calls).
You can fold both into a single lvst_get and guard against NULL.

-	if (lvst_contains(lvst, v->simple_var->name)) {
-		...
-		struct LVSTLine* line = lvst_get(lvst, v->simple_var->name);
-		if (line->type && line->type->pointer_type && (v->member_access != NULL)) {
+	struct LVSTLine* line = lvst_get(lvst, v->simple_var->name);
+	if (line && line->type && line->type->pointer_type && v->member_access) {
 		derefll_append(res, derefll_deref());
 	}

This removes redundant traversal and hardens the code against edge cases.


211-227: Struct‑member resolution on pointer types works, but assertions are brittle

The new logic extracts underlying from a pointer and asserts it has a basic_typesimple_typestruct_type.
If we ever support pointer → typedef → struct or pointer → array → struct, these assertions may fire unexpectedly.

Consider replacing the chained asserts with a single graceful error path:

if (!underlying->basic_type ||
    !underlying->basic_type->simple_type ||
    !underlying->basic_type->simple_type->struct_type) {
        fprintf(stderr,"[derefll] invalid member access on non‑struct pointer\n");
        abort();
}

This keeps debug friendliness while avoiding undefined behaviour in release builds compiled without assert.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e97b8f and a6b3725.

📒 Files selected for processing (34)
  • .gitignore (1 hunks)
  • ast/util/equals_ast.c (1 hunks)
  • ast/visitor/visitor.c (1 hunks)
  • compiler/main/avr_code_gen/cg_avr_basic_block.c (1 hunks)
  • compiler/main/cli/main.c (1 hunks)
  • compiler/main/derefll/derefll.c (2 hunks)
  • compiler/main/gen_tac/gen_tac_assignstmt.c (2 hunks)
  • compiler/main/gen_tac/helper_gen_tac_derefll.c (2 hunks)
  • compiler/main/typechecker/tc_expr.c (1 hunks)
  • compiler/main/typeinference/infer_in_context.c (2 hunks)
  • compiler/main/typeinference/typeinfer.c (2 hunks)
  • compiler/main/typeinference/typeinfer_deref.c (1 hunks)
  • compiler/main/typeinference/typeinfer_expr.c (4 hunks)
  • compiler/main/x86_code_gen/allocate_registers_x86.c (1 hunks)
  • compiler/main/x86_code_gen/cg_x86_single_function.c (1 hunks)
  • compiler/main/x86_code_gen/compile_ir/compile_tac_unary_op.c (2 hunks)
  • examples/stdlib/base/string/streq/test_streq.dg (1 hunks)
  • examples/stdlib/base/string/streq/test_streq.exitcode (1 hunks)
  • examples/stdlib/syscalls/write_file/write_file.dg (1 hunks)
  • examples/struct/struct_on_heap/struct_on_heap.dg (1 hunks)
  • examples/struct/struct_on_heap/struct_on_heap.exitcode (1 hunks)
  • examples/struct/struct_on_heap/struct_on_heap.stdout (1 hunks)
  • ibuffer/ibuffer_write.c (3 hunks)
  • ibuffer/ibuffer_x86.h (2 hunks)
  • ibuffer/ikey.h (2 hunks)
  • ibuffer/mnem.c (2 hunks)
  • lexer/src/lexer.c (1 hunks)
  • rat/rat.c (4 hunks)
  • rat/rat.h (2 hunks)
  • rat/rat_x86.c (1 hunks)
  • rat/rat_x86.h (2 hunks)
  • stdlib/base/allocator.dg (2 hunks)
  • stdlib/base/string.dg (3 hunks)
  • stdlib/syscalls/syscalls.dg (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (12)
compiler/main/avr_code_gen/cg_avr_basic_block.c (2)
rat/rat.h (1)
  • rat_print (25-25)
rat/rat.c (1)
  • rat_print (123-137)
lexer/src/lexer.c (2)
lexer/src/driver.h (1)
  • out_length (10-10)
lexer/src/driver.c (1)
  • out_length (30-41)
compiler/main/x86_code_gen/allocate_registers_x86.c (2)
rat/rat.h (1)
  • rat_print (25-25)
rat/rat.c (1)
  • rat_print (123-137)
rat/rat_x86.c (1)
rat/rat_x86.h (2)
  • rat_print_regname_x86 (29-29)
  • rat_regname_x86 (20-20)
compiler/main/typechecker/tc_expr.c (2)
compiler/main/typechecker/util/tc_utils.h (1)
  • is_integer_type (10-10)
compiler/main/typechecker/util/tc_utils.c (1)
  • is_integer_type (20-25)
compiler/main/cli/main.c (2)
rat/rat.h (1)
  • rat_print (25-25)
rat/rat.c (1)
  • rat_print (123-137)
compiler/main/x86_code_gen/cg_x86_single_function.c (2)
rat/rat.h (1)
  • rat_print (25-25)
rat/rat.c (1)
  • rat_print (123-137)
compiler/main/typeinference/infer_in_context.c (1)
ast/util/str_ast.c (2)
  • str_type (286-302)
  • str_simple_type (210-216)
ibuffer/ibuffer_write.c (2)
rat/rat_x86.h (2)
  • rat_regname_x86 (20-20)
  • rat_regname_x86_width (27-27)
rat/rat_x86.c (2)
  • rat_regname_x86 (172-181)
  • rat_regname_x86_width (183-201)
compiler/main/gen_tac/gen_tac_assignstmt.c (6)
tac/tacbuffer.c (2)
  • tacbuffer_last_dest (116-120)
  • tacbuffer_append (19-27)
compiler/main/gen_tac/gen_tac.h (1)
  • tac_variable_addr (45-45)
compiler/main/gen_tac/gen_tac_variable.c (1)
  • tac_variable_addr (37-67)
compiler/main/typeinference/typeinfer.h (1)
  • infer_type_variable (31-31)
tac/tac.h (1)
  • ctx_tables (14-14)
tac/tac_ctor.c (1)
  • makeTACStore (276-285)
compiler/main/derefll/derefll.c (3)
tables/lvst/lvst.h (2)
  • lvst_contains (52-52)
  • lvst_get (39-39)
tables/lvst/lvst.c (2)
  • lvst_contains (187-196)
  • lvst_get (127-139)
compiler/main/derefll/derefll.h (2)
  • derefll_append (51-51)
  • derefll_deref (53-53)
compiler/main/typeinference/typeinfer_expr.c (4)
compiler/main/typeinference/util/type_str.h (1)
  • typeFromStrPrimitive (5-5)
compiler/main/typeinference/util/type_str.c (1)
  • typeFromStrPrimitive (75-83)
ast/util/str_ast.c (1)
  • str_type (286-302)
ast/util/str_ast.h (1)
  • str_type (32-32)
🔇 Additional comments (51)
lexer/src/lexer.c (1)

156-159: Appropriate addition of tab escape sequence support.

This change adds handling for the '\t' escape sequence in character constants, following the same pattern as the existing code for '\n'. The implementation is correct, using the out_length function to create a token with the tab character and returning the appropriate length.

compiler/main/avr_code_gen/cg_avr_basic_block.c (1)

49-49: Updated RAT printing to use explicit stderr output stream

The function call has been updated to pass stderr as the output stream to rat_print, which aligns with the refactored RAT printing interface that now requires an explicit FILE* parameter.

.gitignore (1)

17-18: Added pattern to ignore register allocation table dump files

This addition to the .gitignore file excludes .rat files from version control, which are generated during the compiler's operation as register allocation table dumps. This is good practice for excluding temporary/generated files.

compiler/main/x86_code_gen/allocate_registers_x86.c (1)

74-74: Updated RAT printing to use explicit stderr output stream

The function call has been updated to pass stderr as the output stream to rat_print, which aligns with the refactored RAT printing interface that now requires an explicit FILE* parameter.

compiler/main/x86_code_gen/compile_ir/compile_tac_unary_op.c (2)

7-7: Including ibuffer_x86.h to support new instruction generation functions.

This header file is needed for the new x86 instruction generation functions (cmp_const, sete, movzx_regs) used in the updated implementation of the unary NOT operation.


22-24: Logical NOT implementation replaced bitwise NOT.

The implementation of TAC_OP_UNARY_NOT has been changed from bitwise negation to logical negation:

  1. cmp_const compares the source register with 0
  2. sete sets the destination register to 1 if equal (source is 0), 0 otherwise
  3. movzx_regs zero-extends the result to the full register width

This implementation correctly produces 1 for zero inputs and 0 for non-zero inputs, matching C's logical NOT (!) behavior.

rat/rat.h (2)

6-6: Including stdio.h for FILE type.

This header is required for the FILE pointer type used in the updated rat_print function signature.


24-25: Enhanced rat_print function now accepts an output stream and reports errors.

The function signature has been improved to:

  1. Accept a FILE pointer parameter, allowing output to arbitrary streams instead of hardcoding to stdout
  2. Return a boolean status to indicate success/failure
  3. Include clear documentation that false is returned on error

This change improves flexibility and error handling.

rat/rat_x86.h (2)

6-6: Including stdio.h for FILE type.

This header is required for the FILE pointer type used in the updated rat_print_regname_x86 function signature.


29-29: Updated rat_print_regname_x86 function signature to accept an output stream.

The function now takes a FILE pointer parameter, allowing output to be directed to arbitrary streams instead of implicitly using stdout. This is consistent with the broader refactoring of RAT printing functions throughout the codebase.

ibuffer/mnem.c (3)

162-162: LGTM: Added mnemonic for new zero-extend register instruction.

The mapping for X86_MOVZX_REGS_WIDTH to "movzx" is correct and follows the x86 assembly convention.


178-178: LGTM: Added mnemonic for new compare with constant instruction.

The mapping for X86_CMP_CONST to "cmp" is correct and follows the x86 assembly convention.


180-180: LGTM: Added mnemonic for new set-if-equal instruction.

The mapping for X86_SETE to "sete" is correct and follows the x86 assembly convention.

ibuffer/ibuffer_write.c (3)

190-192: LGTM: Added formatting for zero-extend move between registers.

The implementation correctly formats the zero-extend instruction for registers with width specification.

Consider adding validation similar to other cases:

case X86_MOVZX_REGS_WIDTH:
+   assert(nbytes == 1 || nbytes == 2 || nbytes == 4);
    sprintf(s, "%s, %s", rat_regname_x86(x1), rat_regname_x86_width(x2, nbytes));
    break;

209-211: LGTM: Added formatting for set-if-equal instruction.

The formatting of the SETE instruction is correct, using the byte-sized register name.

Consider adding validation to ensure the register is valid for a byte operation:

case X86_SETE:
+   assert(nbytes == 1);
    sprintf(s, "%s", rat_regname_x86_width(x1, 1));
    break;

234-236: LGTM: Added formatting for compare with constant instruction.

The formatting is correct and consistent with similar instructions like X86_MOV_CONST.

ibuffer/ibuffer_x86.h (2)

40-40: LGTM: Added macro for zero-extend move between registers.

The macro is correctly defined using ibu3 with the appropriate instruction key and parameters.


64-64: LGTM: Added macro for compare with constant instruction.

The macro is correctly defined using ibu2 with the appropriate instruction key and parameters.

ibuffer/ikey.h (3)

184-186: X86 Instruction Keys Added Correctly

The addition of separate instruction keys for X86_MOVZX_LOAD_WIDTH and X86_MOVZX_REGS_WIDTH with distinct values properly distinguishes between these two different instruction variants.


205-205: New instruction key for constant comparison

The addition of X86_CMP_CONST improves code clarity by explicitly distinguishing comparisons with constants from regular comparisons.


208-208: SETE instruction key added

Adding the X86_SETE instruction key extends the instruction set to support the "set if equal" operation, which is a useful addition for equality comparison implementations.

rat/rat.c (3)

85-121: Function improved with better error handling

The refactored rat_dump_reg function now returns a boolean status and accepts a file stream parameter, which improves both flexibility and error handling. This allows callers to determine if the operation was successful.

The early return on line 91 indicates successful processing when skipping invalid registers, which maintains consistency with the function's return type.


123-137: Improved file stream handling with NULL check

The updated rat_print function now handles NULL file stream parameters gracefully by returning false, preventing potential segmentation faults. It's also more flexible by allowing output to any file stream.


164-164: Updated function calls to explicitly use stdout

All calls to rat_print have been correctly updated to explicitly pass stdout as the output file stream, maintaining backward compatibility while leveraging the new interface.

Also applies to: 173-173, 328-328

stdlib/base/allocator.dg (2)

19-19: Type change affects pointer arithmetic.

Changing ptr from *uint64 to [uint64] may require adjustments to how pointer arithmetic is performed on line 25 (return ptr + 16). Ensure that slice indexing behaves as expected compared to the previous pointer arithmetic.

Verify that the arithmetic operations on the slice type have the same semantics as the previous pointer-based operations, particularly when returning ptr + 16.


25-25: Verify slice/pointer compatibility.

The code returns ptr + 16, which assumes that adding an offset to a slice works the same way as with pointers. This may not be the case in all languages/implementations.

Check that your language implementation supports offset addition on slices in the same way as pointers, or modify this to use proper slice indexing if needed.

ast/util/equals_ast.c (1)

10-12:

Details

❓ Verification inconclusive

Type equality logic has been loosened.

The change makes type equality checking more permissive by returning true if either type has is_anytype set, rather than requiring both to have it set. This could potentially allow more code to pass type checking that would have been rejected before.

Verify that this change is intentional and aligns with the language's type system semantics. Are there any test cases that should now pass or fail differently because of this change?


🏁 Script executed:

#!/bin/bash
# Look for existing tests that involve anytype equality
find . -name "*.c" -o -name "*.h" | xargs grep -l "is_anytype" | xargs grep -l "eq_type"

Length of output: 110


Verify anytype equality semantics and add tests
The change makes eq_type return true if either operand has is_anytype set, widening the match compared to the previous “both‑sides” requirement. We haven’t found any existing tests covering this case, so please:

  • Confirm that permitting one-sided anytype equality aligns with the language’s type‑system design.
  • Add or update test cases exercising:
    • anytype == concreteType
    • concreteType == anytype
    • anytype == anytype
    • mismatches where neither side is anytype (existing behavior)
  • Update any relevant documentation or comments to reflect this looser matching rule.
examples/struct/struct_on_heap/struct_on_heap.exitcode (1)

1-1: LGTM: Exit code aligns with example program behavior.

The exit code value of 43 correctly corresponds to the expected behavior of the struct_on_heap example program, which sets a count_members field to 43 and returns it.

examples/stdlib/base/string/streq/test_streq.exitcode (1)

1-1: LGTM: Expected exit code for successful test.

The zero exit code correctly indicates successful execution of the string equality test program.

examples/struct/struct_on_heap/struct_on_heap.stdout (1)

1-2: Expected output file looks good.

This output file correctly captures the expected stdout content for the struct_on_heap example program (a single "0" character). It serves as a verification baseline for the test framework.

compiler/main/typeinference/typeinfer.c (2)

5-5: Added include for string representation of AST.

Including the str_ast.h header is necessary for the enhanced error reporting added below.


34-37: Enhanced error reporting with type information.

This improvement significantly enhances debugging by showing the actual type that's missing an array_type, rather than just a generic error message. This will help users and developers quickly identify the cause of type inference issues.

Nice attention to memory management - properly freeing the allocated string after use.

compiler/main/typechecker/tc_expr.c (1)

66-68: Improved support for array indexing expressions

The changes extend expressions to allow array types combined with integer types, which follows the same pattern as pointer types. This is a good enhancement for handling array operations consistently with pointer operations.

examples/stdlib/base/string/streq/test_streq.dg (1)

1-17: Well-structured test for string equality

The test covers both positive and negative cases for string comparison, providing different error codes for each failure scenario. This allows for precise identification of which test case failed.

compiler/main/typeinference/typeinfer_deref.c (1)

17-24: Enhanced dereferencing to support array types

Good extension of dereferencing capabilities to include array types alongside pointer types. This complements the changes in the type checker and provides a consistent approach to array and pointer operations.

compiler/main/gen_tac/gen_tac_assignstmt.c (3)

14-14: Added required header for type-checking utilities

Good addition of the header file needed for type-checking utilities.


21-21: Improved parameter naming consistency

Renamed parameter from buffer to buf for better consistency with its usage within the function.


131-146: Enhanced type information in member assignment

The function now retrieves the variable type and consistently uses the renamed parameter. This improves type safety during TAC generation for member assignments.

stdlib/base/string.dg (3)

1-5: Good addition of standardized file descriptor constants

Adding symbolic constants for standard file descriptors improves code readability and maintainability by replacing magic numbers with meaningful names. This follows best practices for systems programming.


21-39: Well-implemented string comparison function

The streq function implementation is correct and efficient:

  • Compares lengths first as an early optimization
  • Only performs character-by-character comparison if lengths match
  • Returns early on the first mismatch

This follows standard string comparison patterns and handles the edge cases appropriately.


95-97: Good addition of convenience print function

The print function provides a simpler interface for printing to standard output by wrapping fprint with the STDOUT constant. This follows the common pattern seen in languages like C (printf/fprintf) and provides a more convenient API.

stdlib/syscalls/syscalls.dg (3)

37-39: Good documentation of error return value

Adding documentation about the -1 error return value is a helpful addition for developers using this syscall. Clear error handling documentation is important for system-level functions.


44-46: Good documentation of error return value

Similar to the open syscall, documenting the error return value for read improves the API's usability by making error handling requirements explicit.


54-54: Improved return type for mmap syscall

Changing the return type from *uint8 to [uint8] better represents what mmap actually does - it returns a memory region that can be treated as an array of bytes. This provides better type safety and makes the API more intuitive.

compiler/main/typeinference/infer_in_context.c (4)

6-6: Good addition of stdlib.h include

Adding the stdlib.h include is necessary for the free() function used in the error handling code.


18-22: Good addition of pointer dereferencing logic

This enhancement enables member access through pointers (e.g., myStruct->field), making the type system more robust. The code correctly dereferences pointer types to access their element types before proceeding with member access.


24-29: Improved error handling for non-struct types

Replacing an assertion with proper error handling improves compiler robustness. The code now prints a helpful error message and returns NULL when a type is not a struct type.


31-35: Improved code readability with local variables

Introducing local variables for intermediate values improves code readability and maintainability by breaking down complex chains of pointer dereferences into more manageable pieces.

Also applies to: 42-49

compiler/main/gen_tac/helper_gen_tac_derefll.c (2)

62-74: Improved robustness for struct type determination

This change adds proper handling for pointer types when determining struct type names. The code now:

  1. Checks if prev_type->basic_type exists directly
  2. Falls back to extracting it from prev_type->pointer_type->element_type if not
  3. Adds appropriate assertions to validate the type structure

This enhancement correctly handles both direct struct access and pointer-to-struct member access.


86-87: Improved architecture compatibility

Replacing the hardcoded width value with an architecture-dependent variable improves cross-platform compatibility. Using the pre-calculated width value ensures correct pointer size handling across different architectures.

The TODO comment is also useful, indicating that ideally the width should be determined by the specific type being dereferenced rather than a general architecture width.

compiler/main/typeinference/typeinfer_expr.c (1)

151-158: Why allow array ⬌ pointer comparisons?

if (t2->pointer_type) mirrors pointer arithmetic rules, yet comparing / computing between an array value and a raw pointer seems semantically dubious in most languages.
Double‑check whether the specification intends to support this combination; otherwise delete the block to prevent accidental acceptance.

fixes for issues found in code review
@pointbazaar pointbazaar merged commit 986067b into master Apr 19, 2025
15 checks passed
@pointbazaar pointbazaar deleted the string-processing branch April 19, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant