ruby · ydah · Feb 10, 2026
diff --git a/doc/Index.md b/doc/Index.md
@@ -3,7 +3,6 @@
 [![Gem Version](https://badge.fury.io/rb/lrama.svg)](https://badge.fury.io/rb/lrama)
 [![build](https://github.com/ruby/lrama/actions/workflows/test.yaml/badge.svg)](https://github.com/ruby/lrama/actions/workflows/test.yaml)
 
-
 ## Overview
 
 Lrama is LALR (1) parser generator written by Ruby. The first goal of this project is providing error tolerant parser for CRuby with minimal changes on CRuby parse.y file.
@@ -47,6 +46,29 @@ Enter the formula:
 => 9
 ```
 
+## Documentation (Draft)
+
+Chapters are split into individual files under `doc/` to make the structure easy to extend.
+
+1. [Concepts](chapters/01-concepts.md)
+2. [Examples](chapters/02-examples.md)
+3. [Grammar Files](chapters/03-grammar-files.md)
+4. [Parser Interface](chapters/04-parser-interface.md)
+5. [Parser Algorithm](chapters/05-parser-algorithm.md)
+6. [Error Recovery](chapters/06-error-recovery.md)
+7. [Handling Context Dependencies](chapters/07-context-dependencies.md)
+8. [Debugging](chapters/08-debugging.md)
+9. [Invoking Lrama](chapters/09-invoking-lrama.md)
+10. [Parsers in Other Languages](chapters/10-other-languages.md)
+11. [History](chapters/11-history.md)
+12. [Version Compatibility](chapters/12-version-compatibility.md)
+13. [FAQ](chapters/13-faq.md)
+
+## Development
+
+1. [Compressed State Table](development/compressed_state_table/main.md)
+2. [Profiling](development/profiling.md)
+
 ## Supported Ruby version
 
 Lrama is executed with BASERUBY when building ruby from source code. Therefore Lrama needs to support BASERUBY, currently 3.1, or later version.

diff --git a/doc/chapters/01-concepts.md b/doc/chapters/01-concepts.md
@@ -0,0 +1,47 @@
+# Concepts
+
+This section introduces the ideas behind Lrama and how it differs from GNU Bison.
+Lrama is a Ruby implementation of an LALR(1) parser generator, built to be a
+drop-in replacement for the Ruby parser toolchain while keeping compatibility
+with Bison-style grammars.
+
+## Lrama at a glance
+
+- **LALR(1) parser generator**: Lrama produces C parsers from grammar files.
+- **Bison-style grammar files**: Most Bison directives are accepted, but there
+  are compatibility constraints (see below).
+- **Error tolerant parsing**: Lrama can generate parsers that attempt recovery
+  using a subset of the algorithm described in *Repairing Syntax Errors in LR
+  Parsers*.
+- **Ruby-focused**: Lrama is written in Ruby and is used in the CRuby build
+  process.
+
+## Compatibility assumptions
+
+Lrama is not a full Bison reimplementation. It intentionally assumes the
+following Bison configuration when reading a grammar file:
+
+- `b4_locations_if` is always true (location tracking is enabled).
+- `b4_pure_if` is always true (pure parser).
+- `b4_pull_if` is always false (no pull parser interface).
+- `b4_lac_if` is always false (no LAC).
+
+These assumptions simplify the code generation path and reflect how CRuby uses
+a Bison-compatible parser.
+
+## Inputs and outputs
+
+A typical Lrama run takes a `.y` grammar file and produces:
+
+- A parser implementation in C (default `y.tab.c`, or the file passed by `-o`).
+- A header file (`y.tab.h`) when `-d` or `-H` is provided.
+- Optional reports (`--report` / `--report-file`).
+- Optional syntax diagram output (`--diagram`).
+
+## Workflow stages
+
+1. Write a grammar file (`.y`) using Bison-compatible syntax.
+2. Run Lrama to generate the parser C code.
+3. Compile the generated C code with the rest of your project.
+
+For worked examples, see the [Examples](02-examples.md) section.
diff --git a/doc/chapters/02-examples.md b/doc/chapters/02-examples.md
@@ -0,0 +1,41 @@
+# Examples
+
+This chapter mirrors the structure of the Bison manual examples, but focuses on
+what is present in the Lrama repository today.
+
+## Calculator example (sample/calc.y)
+
+The [`sample/calc.y`](../../sample/calc.y) grammar is the canonical example
+for running Lrama.
+
+```shell
+$ lrama -d sample/calc.y -o calc.c
+$ gcc -Wall calc.c -o calc
+$ ./calc
+```
+
+The grammar demonstrates:
+
+- Declaring tokens and precedence.
+- Attaching semantic actions in C.
+- Generating a header file with `-d`.
+
+## Minimal parser example (sample/parse.y)
+
+[`sample/parse.y`](../../sample/parse.y) is a smaller grammar intended to be
+used by the build instructions and smoke tests.
+
+```shell
+$ lrama -d sample/parse.y
+```
+
+## Additional grammars
+
+The `sample/` directory includes additional grammars that cover different
+syntax styles:
+
+- [`sample/json.y`](../../sample/json.y)
+- [`sample/sql.y`](../../sample/sql.y)
+
+These are good starting points when verifying compatibility or experimenting
+with new directives.
diff --git a/doc/chapters/03-grammar-files.md b/doc/chapters/03-grammar-files.md
@@ -0,0 +1,113 @@
+# Grammar Files
+
+Lrama reads Bison-style grammar files. Each grammar file has four sections in
+order:
+
+1. **Prologue**: C code copied verbatim into the generated parser.
+2. **Declarations**: Bison-style directives such as `%token` and `%start`.
+3. **Grammar rules**: The productions and semantic actions.
+4. **Epilogue**: C code appended to the end of the generated parser.
+
+A minimal grammar looks like this:
+
+```yacc
+%token INTEGER
+%%
+input: INTEGER '\n';
+%%
+```
+
+## Symbols
+
+- **Terminals** are tokens returned by the lexer.
+- **Nonterminals** are syntactic groupings defined by rules.
+
+Lrama accepts the common `%token`, `%type`, `%left`, `%right`, and
+`%precedence` declarations in the declarations section.
+
+## Rules and actions
+
+Grammar rules use the standard Bison syntax. Semantic actions are C code blocks
+that run when a rule is reduced.
+
+```yacc
+expr:
+    expr '+' expr { $$ = $1 + $3; }
+  | INTEGER       { $$ = $1; }
+  ;
+```
+
+## Parameterized rules
+
+Lrama extends Bison-style rules with parameterization. A nonterminal definition
+may accept other symbols as parameters, allowing you to reuse rule templates.
+Parameterized rules are defined with `%rule` and invoked like a nonterminal.
+
+```yacc
+%rule option(X)
+  : /* empty */
+  | X
+  ;
+
+program:
+    option(statement)
+  ;
+```
+
+When Lrama expands a parameterized rule, it creates a concrete nonterminal
+whose name encodes the parameters. The example above expands to a rule named
+`option_statement`.
+
+### Parameterized rules in the standard library
+
+Lrama ships a standard library of reusable parameterized rules in
+[`lib/lrama/grammar/stdlib.y`](../../lib/lrama/grammar/stdlib.y). Common
+patterns include:
+
+- `option(X)`: optional symbol.
+- `list(X)`: zero or more repetitions.
+- `nonempty_list(X)`: one or more repetitions.
+- `separated_list(separator, X)`: separated list with optional empty case.
+- `separated_nonempty_list(separator, X)`: separated list with at least one
+  element.
+- `delimited(opening, X, closing)`: wrap a symbol with delimiters.
+
+You can reference these directly by including the standard library in your
+grammar or copy them into your own grammar file.
+
+### Semantic values and locations
+
+Parameterized rules support the same semantic action syntax as ordinary rules.
+If you add actions to a parameterized rule, the generated nonterminal keeps the
+action and location references intact. When you call a parameterized rule, the
+resulting nonterminal can be used like any other symbol in subsequent rules.
+
+## Inlining
+
+The `%inline` directive replaces all references to a symbol with its
+definition. It is useful for eliminating extra nonterminals, removing
+shift/reduce conflicts, or keeping small helper rules from polluting the symbol
+list.
+
+```yacc
+%inline opt_newline
+  : /* empty */
+  | '\n'
+  ;
+
+lines:
+    lines opt_newline line
+  | line
+  ;
+```
+
+An inline rule does not create a standalone nonterminal in the output. Instead,
+its productions are substituted wherever the inline symbol is referenced. This
+is why `%inline` is often paired with parameterized rules (for example,
+`%inline ioption(X)` in the standard library) to build reusable templates
+without growing the symbol table.
+
+## Error recovery
+
+Use `error` tokens in rules and enable recovery with `-e` when generating the
+parser. For guidance, see the [Error Recovery](06-error-recovery.md) chapter.
diff --git a/doc/chapters/04-parser-interface.md b/doc/chapters/04-parser-interface.md
@@ -0,0 +1,35 @@
+# Parser Interface
+
+Lrama generates a C parser that follows the same API style as Bison’s default
+C interface. The entry point is `yyparse`, which calls `yylex` to obtain tokens
+from the lexer and uses `yyerror` for error reporting.
+
+## Required functions
+
+- `int yylex(void)` returns the next token and sets semantic values.
+- `int yyparse(void)` drives the parser.
+- `void yyerror(const char *message)` reports syntax errors.
+
+The signatures may vary if you configure `%parse-param` or `%lex-param`
+arguments in your grammar.
+
+## Location tracking
+
+Location tracking is always enabled in Lrama’s compatibility model. Use `@n`
+for the location of a right-hand side symbol and `@$` for the location of the
+left-hand side. Define a location type via `%define api.location.type` or by
+customizing the generated code.
+
+## Header generation
+
+Use `-d` or `-H` to emit a header file containing token definitions and shared
+structures:
+
+```shell
+$ lrama -d sample/parse.y
+```
+
+## Pure parser assumptions
+
+Lrama assumes a pure parser (`b4_pure_if` is always true). This means semantic
+value and location information are passed explicitly rather than using globals.
diff --git a/doc/chapters/05-parser-algorithm.md b/doc/chapters/05-parser-algorithm.md
@@ -0,0 +1,32 @@
+# Parser Algorithm
+
+Lrama produces LALR(1) parsers. The generated parser uses the standard LR
+algorithm with shift/reduce and reduce/reduce conflict resolution.
+
+## Conflicts and precedence
+
+Use `%left`, `%right`, and `%precedence` declarations to resolve
+shift/reduce conflicts. Lrama reports conflicts in the `--report` output and
+with `-v` (alias for `--report=state`).
+
+## Reports and diagnostics
+
+Lrama can emit detailed state and conflict reports during parser generation.
+Common report options include:
+
+- `--report=state`: state machine summary (also `-v`).
+- `--report=counterexamples`: generate conflict counterexamples.
+- `--report=all`: include all reports.
+
+You can write the report to a file with `--report-file`.
+
+```shell
+$ lrama -v --report-file=parser.report sample/parse.y
+```
+
+## Error tolerant parsing
+
+When `-e` is supplied, Lrama enables its error recovery extensions. This uses a
+subset of the algorithm described in *Repairing Syntax Errors in LR Parsers*.
+Refer to [Error Recovery](06-error-recovery.md) for guidance on structuring
+rules.
diff --git a/doc/chapters/06-error-recovery.md b/doc/chapters/06-error-recovery.md
@@ -0,0 +1,29 @@
+# Error Recovery
+
+Lrama supports error tolerant parsing inspired by the algorithm described in
+*Repairing Syntax Errors in LR Parsers*.
+
+## Enabling recovery
+
+Pass `-e` when generating the parser to enable recovery support.
+
+```shell
+$ lrama -e sample/parse.y
+```
+
+## Writing recovery rules
+
+Use the special `error` token in grammar rules to specify recovery points. A
+common pattern is to skip to a statement terminator or newline.
+
+```yacc
+statement:
+    expr ';'
+  | error ';' { /* discard the rest of the statement */ }
+  ;
+```
+
+## Handling recovery in actions
+
+Make sure semantic actions can cope with partially parsed input. Keep actions
+small and defensively check inputs for null values when necessary.
diff --git a/doc/chapters/07-context-dependencies.md b/doc/chapters/07-context-dependencies.md
@@ -0,0 +1,23 @@
+# Handling Context Dependencies
+
+Some grammars are difficult to express with pure context-free rules.
+In these cases, the typical approach is to make the lexer or semantic actions
+context aware.
+
+## Token-level context
+
+Emit different tokens depending on parser state. For example, you can track
+whether you are inside a type declaration and return a distinct token for
+identifiers in that context.
+
+## Semantic predicates
+
+Lrama does not provide Bison-style `%prec` predicates or GLR semantic
+predicates. Instead, use regular semantic actions and explicit tokens to keep
+state.
+
+## Parameterized rules
+
+Parameterized rules can help express repeated patterns without introducing
+ambiguity. Use them to factor context-specific constructs while keeping the
+grammar readable. See the [Grammar Files](03-grammar-files.md) chapter.
diff --git a/doc/chapters/08-debugging.md b/doc/chapters/08-debugging.md
@@ -0,0 +1,32 @@
+# Debugging
+
+Lrama offers both generation-time and runtime diagnostics.
+
+## Generator traces
+
+Use `--trace` to print internal generation traces. Useful values are:
+
+- `automaton`: print state transitions.
+- `rules`: print grammar rules.
+- `actions`: print rules with semantic actions.
+- `time`: report generation time.
+- `all`: enable all traces.
+
+```shell
+$ lrama --trace=automaton,rules sample/parse.y
+```
+
+## Reports
+
+`--report` produces structured reports about states, conflicts, and unused
+rules/terminals. See [Parser Algorithm](05-parser-algorithm.md) for details.
+
+## Syntax diagrams
+
+Use `--diagram` to emit an HTML diagram of the grammar rules.
+
+```shell
+$ lrama --diagram=diagram.html sample/calc.y
+```
+
+The repository includes a sample output in [`sample/diagram.html`](../../sample/diagram.html).