Skip to content

feat(cc): volatile keyword + prefix ++/-- on pointer deref#495

Merged
bboe merged 3 commits into
mainfrom
bboe/cc-volatile-prefix-incdec
May 24, 2026
Merged

feat(cc): volatile keyword + prefix ++/-- on pointer deref#495
bboe merged 3 commits into
mainfrom
bboe/cc-volatile-prefix-incdec

Conversation

@bboe
Copy link
Copy Markdown
Owner

@bboe bboe commented May 24, 2026

Summary

Three small cc.py feature additions that each unblock a Phase 6 libbboeos source file.

Commit 1 — volatile no-op qualifier (cc/tokens.py, cc/parser.py:2120)

Adds volatile to KEYWORDS / TYPE_TOKENS and extends the leading qualifier loop in parse_type so const / signed / volatile all drop without semantics. Unblocks user/libbboeos/signal.c (failed at line 5 on typedef volatile int sig_atomic_t;).

Commit 2 — prefix *++p / *--p (cc/parser.py:1379, cc/parser.py:1587, cc/codegen/x86/emission.py:1808, cc/codegen/x86/emission.py:2978)

Replaces the two parser rejection sites with real codegen. Lowering reuses DerefIncrement / DerefIncrementAssign with is_postfix=False; the x86 emitter calls _emit_pointer_bump before the load / store (and clears the accumulator before the prefix-read reload). Read forms (*++p, *--p), write form (*++p = expr;), and char * / int * widths are all covered. Unblocks user/libbboeos/string.c (failed around line 166 on *++p = *src++; inside strncat-style routines).

Commit 3 — integer-literal suffix L / U / LL / UL / ULL (cc/tokens.py:91, cc/lexer.py)

NUMBER regex now consumes an optional trailing u / l combination (any case); the lexer strips the suffix so the token text stays a plain integer string. cc.py has no real long / unsigned distinction at the literal level — the suffix just needs to lex. Unblocks user/libbboeos/stdio.c (failed at line 451 on pos == (off_t)-1 ? -1L : (long)pos).

Investigation findings (not fixed in this PR)

  • user/libbboeos/dirent.c:315DIR *directory = malloc(sizeof *directory);. cc.py only supports sizeof(type) / sizeof(var); the standard sizeof of a unary-expression operand without parens is a substantive parser feature (real sizeof <unary-expression> parsing with size-inference from the expression's type). Deferred.
  • user/libbboeos/stdio.c — past line 451 fails at line 621 on a double literal in a default-argument-like construct (expected expression, got DOUBLE). cc.py has no floating-point support; this is a separate, much larger gap.

Test plan

  • python3 -m pytest tests/unit/ tests/test_ccobj.py tests/test_ccld.py tests/test_ccar.py -x -q — 572 passed
  • python3 tests/test_cc_local_structs.py — 9 / 9 pass
  • python3 tests/test_cc_bits.py — 112 / 112 pass
  • python3 tests/test_asm.py — 42 / 42 pass
  • ./make_os.sh — clean build
  • Probed each commit individually against its target libbboeos file: signal.c past line 5, string.c past line 166, stdio.c past line 451.

Three new unit tests cover volatile parsing across typedef / global / parameter / local; four new unit tests cover prefix *++p / *--p (read char, read int, write char, decrement read) with assertions pinning the bump-before-load/store ordering; one new test covers integer-literal suffix lexing across L / u / UL / LL / uLL.

Do not merge — leaving the PR open for review.

🤖 Generated with Claude Code

bboe and others added 3 commits May 24, 2026 05:54
Unblocks user/libbboeos/signal.c (and any other libbboeos file that
includes <signal.h>): the standard typedef `volatile int sig_atomic_t;`
was failing with `expected type, got IDENT ('volatile')`.  cc.py has no
memory-model semantics, so `volatile` simply joins `const` / `signed`
in the leading no-op qualifier loop in parse_type.

Also adds `volatile` to KEYWORDS and TYPE_TOKENS so it lexes as its own
token kind and starts a declaration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the parser rejection of *++p / *--p (and *++p = expr; /
*--p = expr;) with proper codegen.  Semantics: prefix bumps the
pointer by sizeof(*p) bytes first, then derefs through the updated
pointer — what standard C requires for the `*++p = *src++;` copy /
catenate idiom used in libbboeos/string.c.

Lowering: the existing DerefIncrement / DerefIncrementAssign nodes
now carry `is_postfix=False` for the prefix shapes.  The x86 emitter
calls `_emit_pointer_bump` *before* the load/store (and clears the
accumulator before the prefix-read reload) so the rest of the
codegen path is shared with the postfix forms.

Unblocks user/libbboeos/string.c (failed at line ~166 on
`*++p = *src++;` inside strncat-style routines).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cc.py's NUMBER regex matched only bare digits, so any C source that
spelled an integer literal with a width / signedness suffix
(``-1L``, ``0xFFFFFFFFu``, ``1ULL``) failed to tokenize — the suffix
letter lexed as a separate IDENT and the parser then choked.

The fix extends the NUMBER pattern to consume an optional trailing
``u`` / ``l`` combination (any case) and strips it in the lexer so
the token text stays a plain integer string that ``int(...)`` can
parse.  cc.py has no real long / unsigned distinction at the literal
level; the suffix just needs to lex.

Unblocks user/libbboeos/stdio.c (failed at line 451 on
``pos == (off_t)-1 ? -1L : (long)pos``).

Also documents the remaining dirent.c gap (line 315: ``sizeof
*directory`` — ``sizeof`` of a unary-expression operand without
parens, requires real sizeof-expression parsing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bboe bboe merged commit 7dfde5a into main May 24, 2026
27 checks passed
@bboe bboe deleted the bboe/cc-volatile-prefix-incdec branch May 24, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant