diff --git a/README.md b/README.md index e1a525b..4e67bea 100644 --- a/README.md +++ b/README.md @@ -13,17 +13,16 @@ `mq-db` treats Markdown documents as **structured, hierarchical databases** rather than plain text. It parses Markdown into a flat block list with an **interval index** (Nested Set / Pre-Post Order), enabling O(1) section hierarchy queries. Documents can be queried with **SQL** or **[mq](https://github.com/harehare/mq)** and persisted to a compact custom page-file format. -``` -[Markdown File] - │ - ▼ CST Parser (mq-markdown) -[Block Tree] ─── (heading, paragraph, code, list, …) - │ - ▼ Interval Index + Secondary Indexes -[Flat Block Vector] pre/post integers + BitmapIndex / BTreeIndex / HashIndex - │ - ├── SQL Engine (sqlparser — custom native evaluator, no SQLite) - └── mq Engine (mq-lang evaluator) +```mermaid +flowchart TD + A["Markdown File(s)"] -->|"CST Parser (mq-markdown)"| B["Block Tree\n(heading · paragraph · code · list …)"] + B -->|"Interval Index + Secondary Indexes"| C["Flat Block Vector\n(pre/post integers)"] + C --> D["BitmapIndex\n(block_type)"] + C --> E["BTreeIndex\n(pre / post)"] + C --> F["HashIndex\n(content / lang / depth)"] + C --> G["Zone Maps\n(per-document stats)"] + C --> H["SQL Engine\n(sqlparser — custom native evaluator)"] + C --> I["mq Engine\n(mq-lang evaluator)"] ``` > [!IMPORTANT] @@ -36,6 +35,8 @@ - **Three-layer secondary indexes** — `BitmapIndex` (block type), `BTreeIndex` (pre/post), `HashIndex` (content/lang/depth) for fast SQL predicate pushdown - **Zone Maps** — per-document statistics skip irrelevant files before scanning any blocks - **Dual query engines** — SQL via a custom `sqlparser`-based evaluator, and `mq` via `mq-lang` +- **DDL support** — `CREATE TABLE`, `INSERT INTO`, `DROP TABLE` for in-memory custom tables +- **`mq()` scalar function** — run an mq program against Markdown content inline in SQL - **Custom page-file persistence** — 8 KB fixed pages, checksums, atomic writes - **CLI + interactive REPL + TUI** — full terminal experience @@ -114,6 +115,30 @@ mq-db sql " " --db store.mq-db ``` +**`mq()` scalar function** — run an mq program against Markdown content inline: + +```bash +mq-db sql "SELECT mq('.h1 | to_text', content) AS title FROM blocks WHERE block_type = 'code'" --db store.mq-db +``` + +### DDL — custom in-memory tables + +```bash +# Create from a SELECT result +mq-db sql "CREATE TABLE headings AS SELECT content, depth FROM blocks WHERE block_type = 'heading'" --db store.mq-db + +# Create with explicit schema, then insert +mq-db sql "CREATE TABLE notes (id TEXT, body TEXT)" --db store.mq-db +mq-db sql "INSERT INTO notes VALUES ('1', 'Hello world')" --db store.mq-db + +# Inspect +mq-db sql "SHOW TABLES" --db store.mq-db +mq-db sql "DESC notes" --db store.mq-db + +# Drop +mq-db sql "DROP TABLE notes" --db store.mq-db +``` + ### mq queries ```bash @@ -331,10 +356,22 @@ SELECT id, document_id, block_type, content, pre, post, | Function | Description | |---|---| | `under(pre, post, anc_pre, anc_post)` | O(1) interval ancestor check | +| `mq(program, content)` | Run an mq program against Markdown content | | `json_extract(json, path)` | Extract a value from a JSON string | | `count(*) / min / max / sum / avg` | Aggregate functions | | `lower / upper / length / coalesce` | Scalar utilities | +### DDL statements + +| Statement | Description | +|---|---| +| `CREATE TABLE name AS SELECT …` | Create a custom table from a query result | +| `CREATE TABLE name (col TYPE, …)` | Create an empty custom table with explicit schema | +| `INSERT INTO name VALUES (…)` | Insert a row into a custom table | +| `DROP TABLE name` | Drop a custom table | +| `SHOW TABLES` | List all custom tables | +| `DESC name` | Show schema of a custom table | + ### Example queries ```sql @@ -347,6 +384,11 @@ WHERE under(b.pre, b.post, AND b.block_type IN ('paragraph', 'code') ORDER BY b.pre; +-- Extract H1 title from code block content via the mq() scalar function +SELECT mq('.h1 | to_text', content) AS title +FROM blocks +WHERE block_type = 'code' AND lang = 'markdown'; + -- H2 headings immediately followed by a list (structural lint) SELECT d.path, h.content AS heading FROM blocks h @@ -390,6 +432,16 @@ struct Block { mq-db applies three complementary index layers, cheapest-first. +```mermaid +flowchart LR + Q["SQL Query"] --> ZM["Layer 1\nZone Maps\n(document skip)"] + ZM -->|"relevant docs"| II["Layer 2\nInterval Index\n(section scope)"] + II -->|"candidate blocks"| SI["Layer 3\nSecondary Indexes\n(block lookup)"] + SI -->|"BitmapIndex\nBTreeIndex\nHashIndex"| R["Result Rows"] + ZM -->|"skip"| X1["✗ irrelevant docs"] + SI -->|"no hint"| FS["Full Scan"] +``` + #### Layer 1 — Zone Maps (document-level skip) Built once per document and stored in the `.mq-db` file. Checked before any block is read: @@ -405,13 +457,20 @@ Built once per document and stored in the `.mq-db` file. Checked before any bloc Heading hierarchy encoded as `(pre, post)` pairs via Pre-Post Order (Nested Set) traversal: -``` -# Doc pre=0, post=11 -## Section A pre=2, post=7 - Paragraph pre=3, post=4 - Code pre=5, post=6 -## Section B pre=8, post=11 - Paragraph pre=9, post=10 +```mermaid +graph TD + doc["# Doc\npre=0 · post=11"] + secA["## Section A\npre=2 · post=7"] + para1["Paragraph\npre=3 · post=4"] + code1["Code\npre=5 · post=6"] + secB["## Section B\npre=8 · post=11"] + para2["Paragraph\npre=9 · post=10"] + + doc --> secA + doc --> secB + secA --> para1 + secA --> code1 + secB --> para2 ``` `A is_under B` ↔ `B.pre < A.pre AND A.post < B.post` — O(1), no tree traversal. @@ -426,24 +485,28 @@ Heading hierarchy encoded as `(pre, post)` pairs via Pre-Post Order (Nested Set) SQL predicate pushdown picks an `IndexHint`: -``` -WHERE block_type = 'heading' → BitmapIndex -WHERE pre = 42 → BTreeIndex (point) -WHERE pre BETWEEN 10 AND 50 → BTreeIndex (range) -WHERE content = 'Architecture' → HashIndex -WHERE lang = 'rust' → HashIndex -WHERE depth = 2 → HashIndex -(other) → FullScan +```mermaid +flowchart TD + P["SQL WHERE predicate"] + P -->|"block_type = '...'"| B["BitmapIndex"] + P -->|"pre = N"| BT1["BTreeIndex (point)"] + P -->|"pre BETWEEN N AND M"| BT2["BTreeIndex (range)"] + P -->|"content = '...'"| H1["HashIndex"] + P -->|"lang = '...'"| H2["HashIndex"] + P -->|"depth = N"| H3["HashIndex"] + P -->|"other"| FS["Full Scan"] ``` ### Storage format Custom 8 KB page file: -``` -Page 0 │ File header (magic 0x4D514442, version, page count) -Page 1 │ Catalog (doc_id → first_block_page, num_blocks, ZoneMaps) -Page 2+ │ Block data (linked page chains, overflow pages) +```mermaid +block-beta + columns 1 + block:header["Page 0 — File Header\nmagic 0x4D514442 · version · page count"] + block:catalog["Page 1 — Catalog\ndoc_id → first_block_page · num_blocks · ZoneMaps"] + block:blocks["Page 2+ — Block Data\nlinked page chains · overflow pages"] ``` Writes are atomic: data goes to `.tmp` then renamed to `` on success. diff --git a/src/query.rs b/src/query.rs index 63307e1..8f66a9d 100644 --- a/src/query.rs +++ b/src/query.rs @@ -4,10 +4,6 @@ use crate::{ store::DocumentStore, }; -// ───────────────────────────────────────────────────────────────────────────── -// Section anchor: how to locate the enclosing heading section -// ───────────────────────────────────────────────────────────────────────────── - enum SectionAnchor { /// Directly supply a known (pre, post) interval. Interval { pre: u32, post: u32 }, @@ -15,20 +11,12 @@ enum SectionAnchor { Heading { content: String, depth: Option }, } -// ───────────────────────────────────────────────────────────────────────────── -// QueryResult: a matched block with its owning document -// ───────────────────────────────────────────────────────────────────────────── - /// A query result pairing a matched [`Block`] with its parent [`Document`]. pub struct QueryResult<'a> { pub block: &'a Block, pub document: &'a Document, } -// ───────────────────────────────────────────────────────────────────────────── -// Query builder -// ───────────────────────────────────────────────────────────────────────────── - /// Chainable, lazy query builder over a [`DocumentStore`]. /// /// Filters are applied in evaluation order (cheapest first): @@ -73,8 +61,6 @@ impl<'store> Query<'store> { } } - // ── Document-level filters ─────────────────────────────────────────────── - /// Skip documents for which `predicate` returns `false`. /// /// Use this to leverage zone-map statistics before scanning blocks: @@ -92,8 +78,6 @@ impl<'store> Query<'store> { self } - // ── Section scope (UNDER) ──────────────────────────────────────────────── - /// Restrict results to blocks that fall within the heading section /// identified by `content` and optional `depth`. /// @@ -117,8 +101,6 @@ impl<'store> Query<'store> { self } - // ── Block-level filters ────────────────────────────────────────────────── - /// Keep only blocks for which `predicate` returns `true`. pub fn filter(mut self, f: F) -> Self where @@ -156,16 +138,12 @@ impl<'store> Query<'store> { self.filter(move |b| b.content.to_lowercase().contains(pat.as_str())) } - // ── Limiting ───────────────────────────────────────────────────────────── - /// Stop collecting after `n` results. pub fn limit(mut self, n: usize) -> Self { self.limit = Some(n); self } - // ── Execution ──────────────────────────────────────────────────────────── - /// Execute the query, returning matched (block, document) pairs in /// document order. pub fn collect(&self) -> Vec> { @@ -243,8 +221,6 @@ impl<'store> Query<'store> { self.collect().len() } - // ── Linter helpers ─────────────────────────────────────────────────────── - /// Find all (heading, next_sibling) pairs where the heading matches the /// given depth and the immediately following sibling has one of the /// `forbidden_types`. @@ -310,10 +286,6 @@ pub struct LintViolation<'a> { pub document: &'a Document, } -// ───────────────────────────────────────────────────────────────────────────── -// Tests -// ───────────────────────────────────────────────────────────────────────────── - #[cfg(test)] mod tests { use super::*; diff --git a/src/sql.rs b/src/sql.rs index f548806..cf4a819 100644 --- a/src/sql.rs +++ b/src/sql.rs @@ -22,6 +22,7 @@ //! |---|---| //! | `under(pre, post, anc_pre, anc_post)` | O(1) interval ancestor check | //! | `json_extract(json, path)` | Extract value from JSON string | +//! | `mq(program, content)` | Run an mq program against Markdown content | //! | `count(*)`/`min`/`max`/`sum`/`avg` | Aggregates | //! | `lower`/`upper`/`length`/`coalesce` | Scalar utilities | //! @@ -54,6 +55,8 @@ use sqlparser::{ parser::Parser, }; +use mq_lang::{DefaultEngine, parse_markdown_input}; + use crate::{ DocumentStore, MqdbError, block::{Block, BlockType, Properties, PropertyValue}, @@ -61,10 +64,6 @@ use crate::{ indexes::{DocumentIndex, IndexHint}, }; -// ───────────────────────────────────────────────────────────────────────────── -// Value — runtime value type -// ───────────────────────────────────────────────────────────────────────────── - #[derive(Debug, Clone, PartialEq)] pub enum Value { Str(String), @@ -127,10 +126,6 @@ impl Value { } } -// ───────────────────────────────────────────────────────────────────────────── -// Row — named tuple -// ───────────────────────────────────────────────────────────────────────────── - #[derive(Debug, Clone)] struct Row { columns: Vec, @@ -160,10 +155,6 @@ impl Row { } } -// ───────────────────────────────────────────────────────────────────────────── -// Output format helpers -// ───────────────────────────────────────────────────────────────────────────── - fn json_value_str(s: &str) -> String { if let Ok(n) = s.parse::() { return n.to_string(); @@ -203,10 +194,6 @@ pub fn html_escape(s: &str) -> String { .replace('"', """) } -// ───────────────────────────────────────────────────────────────────────────── -// QueryOutput -// ───────────────────────────────────────────────────────────────────────────── - /// The tabular output of a SQL query. #[derive(Debug)] pub struct QueryOutput { @@ -409,10 +396,6 @@ impl QueryOutput { } } -// ───────────────────────────────────────────────────────────────────────────── -// Serialisation helpers -// ───────────────────────────────────────────────────────────────────────────── - fn pv_to_json(pv: &PropertyValue) -> String { match pv { PropertyValue::String(s) => { @@ -445,10 +428,6 @@ fn properties_to_json(props: &Properties) -> String { format!("{{{}}}", pairs.join(",")) } -// ───────────────────────────────────────────────────────────────────────────── -// Virtual table materialisation -// ───────────────────────────────────────────────────────────────────────────── - fn block_to_row(doc_id: u32, block: &Block, block_idx: u32) -> Row { Row { columns: vec![ @@ -531,10 +510,6 @@ fn cross_join(left: Vec, right: Vec) -> Vec { out } -// ───────────────────────────────────────────────────────────────────────────── -// Expression evaluation -// ───────────────────────────────────────────────────────────────────────────── - fn eval_sql_value(v: &SqlValue) -> Value { match v { SqlValue::Number(n, _) => { @@ -764,10 +739,48 @@ fn eval_scalar_function(name: &str, args: &[Value]) -> Value { .find(|v| !matches!(v, Value::Null)) .cloned() .unwrap_or(Value::Null), + "mq" => { + if args.len() < 2 { + return Value::Null; + } + let program = match args[0].as_str() { + Some(s) => s.to_string(), + None => return Value::Null, + }; + let content = match args[1].as_str() { + Some(s) => s.to_string(), + None => return Value::Null, + }; + eval_mq_scalar(&program, &content) + } _ => Value::Null, } } +fn eval_mq_scalar(program: &str, content: &str) -> Value { + let mut engine = DefaultEngine::default(); + engine.load_builtin_module(); + let input = match parse_markdown_input(content) { + Ok(i) => i, + Err(_) => return Value::Null, + }; + match engine.eval(program, input.into_iter()) { + Ok(output) => { + let parts: Vec = output + .compact() + .into_iter() + .map(|v| v.to_string()) + .collect(); + if parts.is_empty() { + Value::Null + } else { + Value::Str(parts.join("\n")) + } + } + Err(_) => Value::Null, + } +} + fn extract_json_key(json: &str, key: &str) -> Value { let s = json.trim(); if !s.starts_with('{') { @@ -833,10 +846,6 @@ fn like_dp(s: &[char], p: &[char], si: usize, pi: usize) -> bool { matches && like_dp(s, p, si + 1, pi + 1) } -// ───────────────────────────────────────────────────────────────────────────── -// SqlEngine -// ───────────────────────────────────────────────────────────────────────────── - /// Custom SQL execution engine backed by a [`DocumentStore`] reference. /// /// Secondary indexes are built once on construction (O(n) in total block count) @@ -1469,10 +1478,6 @@ impl<'a> SqlEngine<'a> { } } -// ───────────────────────────────────────────────────────────────────────────── -// Projection helpers -// ───────────────────────────────────────────────────────────────────────────── - fn projection_columns(projection: &[SelectItem], first_row: Option<&Row>) -> Vec { if projection.len() == 1 && matches!(projection[0], SelectItem::Wildcard(_)) { return first_row @@ -1686,10 +1691,6 @@ fn apply_limit(mut rows: Vec>, limit: Option<&Expr>) -> Vec IndexHint { } } -// ───────────────────────────────────────────────────────────────────────────── -// BlockType::from_str helper -// ───────────────────────────────────────────────────────────────────────────── - impl BlockType { fn from_str(s: &str) -> Option { match s { @@ -1882,10 +1879,6 @@ impl BlockType { } } -// ───────────────────────────────────────────────────────────────────────────── -// Tests -// ───────────────────────────────────────────────────────────────────────────── - #[cfg(test)] mod tests { use super::*; @@ -2230,4 +2223,30 @@ mod tests { assert!(names.contains(&"documents")); assert!(names.contains(&"extra")); } + + // mq() scalar function applied to a literal markdown string + #[test] + fn test_mq_scalar_function() { + let store = make_store(); + let engine = SqlEngine::new(&store).unwrap(); + let out = engine + .execute( + "SELECT mq('.h1 | to_text', '# Hello\n\nWorld\n') AS title FROM blocks LIMIT 1", + ) + .unwrap(); + assert_eq!(out.rows.len(), 1); + assert_eq!(out.rows[0][0], "Hello"); + } + + // mq() returns NULL when program produces no output + #[test] + fn test_mq_scalar_null_on_no_match() { + let store = make_store(); + let engine = SqlEngine::new(&store).unwrap(); + let out = engine + .execute("SELECT mq('.h1', '## No h1 here\n') FROM blocks LIMIT 1") + .unwrap(); + assert_eq!(out.rows.len(), 1); + assert_eq!(out.rows[0][0], "NULL"); + } }