Skip to content

prostojs/parser

Repository files navigation

Build a parser for anything — in minutes, not months.

Stop writing ad-hoc regex spaghetti or reaching for heavyweight parser generators. @prostojs/parser gives you composable building blocks: define your nodes, wire them together, and get a working parser with structured output — fast.

Why This Parser?

It's LEGO for parsers. Each node is a self-contained piece — a tag, a string, a comment, an attribute. Snap them together and you have a full grammar. Need to change something? Swap one block, everything else stays.

Output is built during parsing. Hooks fire as tokens are matched — onOpen, onClose, onContent, onChild. Your data is in its final shape the moment parsing ends. No AST-to-output conversion step. No tree walking.

Near-zero boilerplate. Write data: { tag: '', attrs: {} } and it just works — auto-cloned per match, regex named groups auto-mapped to fields. A full XML-to-JSON parser is ~400 lines.

Competitive performance. A general-purpose toolkit parsing XML is within 4-36% of fast-xml-parser, a dedicated XML-only library. For most formats you'll parse, there is no dedicated alternative — and this is fast enough.

Install

npm install @prostojs/parser

30-Second Overview

Every parser is a tree of Nodes. Each node knows how to start, how to end, and what it can contain:

import { Node, parse } from '@prostojs/parser'

// A string: starts with a quote, ends with the same quote
const string = new Node<{ quote: string }>({
  name: 'string',
  start: { token: /(?<quote>["'])/, omit: true },
  end: { token: (ctx) => ctx.node.data.quote, omit: true },
  data: { quote: '' },
})

// A key=value pair: key captured from regex, value from content
const pair = new Node<{ key: string; value: string }>({
  name: 'pair',
  start: { token: /(?<key>\w+)\s*=\s*/, omit: true },
  end: { token: /\n|$/, omit: true },
  recognizes: [string],
  data: { key: '', value: '' },
  mapContent: 'value',
})

// Root: contains pairs, closes at EOF
const root = new Node({ name: 'root', eofClose: true, recognizes: [pair] })

const result = parse(root, 'name = "Alice"\nage = "30"')
// result.content → [ParsedNode{key:'name', value:'Alice'}, ...]

That's a working config file parser. No grammar files, no build step, no code generation.

How It Works

1. Define Nodes

A node is a pattern with a start token, an end token, and typed data:

const comment = new Node<{ text: string }>({
  name: 'comment',
  start: { token: '<!--', omit: true },
  end: { token: '-->', omit: true },
  data: { text: '' },
  mapContent: 'text',  // auto-joins text content into data.text
})

Tokens can be strings, RegExps (with named capture groups), or dynamic functions:

// String — exact match
start: '{'

// RegExp — captures data automatically
start: { token: /<(?<tag>\w+)/, omit: true }

// Dynamic — computed from current node's data
end: { token: (ctx) => `</${ctx.node.data.tag}>`, omit: true }

Token modifiers:

  • omit — strip the token from node content
  • eject — don't consume the match, let the parent handle it
  • backslash — ignore the token if preceded by \

2. Compose Them

Tell each node what children it can contain:

const root = new Node({ name: 'root', eofClose: true })
root.recognize(comment, tag, cdata)
tag.recognize(attribute, innerContent)
innerContent.recognize(comment, tag, cdata)

That's your grammar. No separate DSL — it's just JavaScript.

3. Add Hooks to Shape Output

Hooks fire during parsing — use them to build your output in its final format:

tag
  .onOpen((node, match) => {
    // start token matched — node.data is ready (named groups already mapped)
    // return false to reject this match
  })
  .onChild((child, node) => {
    // a child node was fully parsed
    // route its data wherever you need it
    if (child.node === attribute) {
      node.data.attrs[child.data.key] = child.data.value
    }
  })
  .onContent((text, node) => {
    // text is about to be added — transform or suppress it
    return text.trim()
  })
  .onClose((node) => {
    // end token matched — finalize the output
  })

4. Parse

import { parse } from '@prostojs/parser'

const result = parse(root, sourceString)
// result: ParsedNode with .content, .data, .start, .end

Key Features

Named Group Auto-Mapping

Regex named groups map directly to data fields — available before onOpen fires:

const tag = new Node<{ tag: string }>({
  start: { token: /<(?<tag>\w+)/ },
  data: { tag: '' },
})
.onOpen((node) => {
  console.log(node.data.tag) // already populated
})

Plain Data Templates

No factory functions. Just declare a plain object — it's auto-cloned per match with an optimized cloner:

data: { tag: '', attrs: {}, children: [] }
// primitives → spread clone
// objects/arrays → shallow clone

mapContent

Auto-join all text content into a data field on node close. Replaces the most common onClose pattern:

data: { text: '' },
mapContent: 'text',
// equivalent to: .onClose(node => { node.data.text = textContent(node) })

Utilities

import { textContent, children, findChild, findChildren, walk, printTree } from '@prostojs/parser'

textContent(node)              // joined string content
children(node)                 // child ParsedNodes (no strings)
findChild(node, targetNode)    // first child of a specific node type
findChildren(node, targetNode) // all children of a specific node type
walk(node, (child, depth) => { ... })  // depth-first walk
printTree(node)                // debug visualization

Node Options Reference

Option Type Description
name string Identifier (for debugging / printTree)
start TokenDef | TokenDef[] Start token(s)
end TokenDef | TokenDef[] End token(s)
recognizes Node[] Child nodes this node can contain
skip Token | Token[] Tokens to silently skip (e.g. whitespace)
bad Token | Token[] Tokens that trigger a parse error
eofClose boolean Allow this node to close at end of input
data T | () => T Data template (auto-cloned) or factory
mapContent string Auto-join text content into this data field
hooks NodeHooks<T> Inline hook definitions

Error Handling

import { ParseError } from '@prostojs/parser'

try {
  parse(root, source)
} catch (e) {
  if (e instanceof ParseError) {
    console.log(e.message) // includes line, column, and context
  }
}

Throws on unclosed nodes and bad tokens with precise source positions.

Examples

Each example is a standalone parser showcasing different aspects of the API. All source is in examples/.

Example What it parses Highlights
XML-to-JSON Full XML → JSON (fast-xml-parser compatible) Dynamic end tokens, hooks-based output, entity decoding, ~400 lines
JSON JSON strings → JS values onContent for bare primitives, state tracking for key/value disambiguation
Math Evaluator 2 + 3 * (4 - 1)11 Recursive group nodes, result computed during parsing — no AST
Template String Hello, {{name}}! → parts array Minimal 2-node parser, mapContent for zero-hook data capture
CSS Selector div.cls > span:hover → structured parts Dynamic quote matching, regex tokenization in onContent
URL Parser URLs → protocol/host/path/query/hash Named group auto-mapping, eject for boundary detection
ESM Analyzer JS/TS source → imports, exports, unused String/comment nodes as "shields" against false positives

Migration from v0.5

See MIGRATION.md for a comprehensive guide.

License

MIT

About

Parse anything

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors