feat(parser): add OCaml (.ml / .mli) support#622
Conversation
|
👋 Thanks for the contribution! Quick heads-up: this repo lands changes on the current Please retarget this PR via Edit → base branch to the active release branch (currently (Automated hint — reply here if you need a hand.) |
Add language detection, line-based outline parsing, nested (* ... *) comment handling, and tests for OCaml. - Detect .ml and .mli files as Language.ocaml - Parse open/include/module/type/external/exception/val/let/and - Classify let/and as function when -> or fun/function is present - Handle nested OCaml block comments in the line loop - Add parser test covering modules, types, let/rec, external, and imports
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d76167b80
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (std.mem.startsWith(u8, trimmed[i..], "(*")) { | ||
| ocaml_comment_depth += 1; |
There was a problem hiding this comment.
Skip OCaml comment markers inside strings
For valid OCaml such as let pattern = "(*", this scan treats the string contents as a real comment opener, leaves ocaml_comment_depth nonzero, and then skips all subsequent definitions until a later *). OCaml comment delimiters are ignored inside string/char literals, so files containing these characters in strings lose most of their outline; the comment scanner should ignore quoted literals before updating the depth.
Useful? React with 👍 / 👎.
| const kind: SymbolKind = if (std.mem.indexOf(u8, code, "->") != null) .function else blk: { | ||
| if (std.mem.indexOfScalar(u8, code, '=')) |eq_pos| { | ||
| const rhs = std.mem.trimStart(u8, code[eq_pos + 1 ..], " \t"); | ||
| if (startsWith(rhs, "fun ") or startsWith(rhs, "function")) break :blk .function; | ||
| } | ||
| break :blk .constant; |
There was a problem hiding this comment.
Classify curried OCaml lets as functions
In OCaml, the common function form is let make name age = ... or let rec fib n = ..., but this heuristic only marks bindings as functions when the line contains -> or the RHS starts with fun/function. As a result, most ordinary OCaml functions are indexed as constants, which makes the new language support report misleading symbol kinds for normal .ml files; after parsing the binding name, check whether another identifier/pattern precedes the = before falling back to .constant.
Useful? React with 👍 / 👎.
4d76167 to
d3d988a
Compare
Comment delimiters (* and *) inside quoted strings were incorrectly treated as real comment openers, causing the scanner to skip subsequent definitions until a later close. Now the scanner tracks string and char literal state and ignores comment tokens inside quoted regions.
Adds OCaml language support to the line-based parser.
.ml/.mlifilesopen/include/module/type/external/exception/val/let/and(* ... *)commentslet/andas function when->orfun/functionis presentzig build test-parser: 70/70 passed.