This library provides multiple JSON streaming parser implementations, each optimized for different use cases:
- JSONParser (TypeScript Transform Stream) - For Node.js streams
- Native Parser (C++ Background Thread) - For file descriptors
- Worker Parser (JavaScript Worker Thread) - Pure JS alternative
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Stream (TCP, stdin, pipe, etc.) β
β β
β Data: "{\"foo\":1}\n{\"bar\":2}\n" β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β .pipe()
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JSONParser (Transform Stream) β
β β
β _transform(chunk, encoding, cb) { β
β // Split by delimiter β
β // Parse each JSON string β
β // Push parsed objects β
β } β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β .on('data')
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JavaScript Event Loop β
β β
β parser.on('data', (obj) => { β
β // Process parsed object β
β }); β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Characteristics:
- β
Works with any Node.js stream
- β
Simple, pure JavaScript
- β
Fast when main thread is idle
- β Blocks main thread during parsing
- β Performance degrades under load
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Main Thread β
β β
β const fd = fs.openSync('/path/to/file', 'r'); β
β const parser = createJsonParserNativeFromFd(fd); β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Native Addon (N-API) β β
β β β β
β β 1. Receives fd from JS β β
β β 2. Duplicates: fd_dup = dup(fd) β β
β β 3. Starts C++ background thread β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β β fd_dup β
β βΌ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Direct syscall
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β C++ Background Thread (std::thread) β
β β
β while (!stop) { β
β // Direct read from kernel β
β ssize_t n = read(fd_dup, buf, BUF_SZ); β
β β
β // Process data in C++ β
β // Split by delimiter β
β // Prepare batches β
β β
β // Send to main thread via TSFN β
β napi_call_threadsafe_function(tsfn, batch); β
β } β
β β
β Data path: Kernel β C++ buffer β Zero-copy β JS β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β Thread-Safe Function (TSFN)
β Zero-copy external buffers
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Main Thread (TSFN Callback) β
β β
β call_js_from_tsfn_with_instance(env, cb, ctx, data) { β
β // Receive batch from C++ thread β
β // If passRawBuffers: true β
β // - Buffers are zero-copy (external buffers) β
β // - Parse with V8's JSON.parse() β
β // If passRawBuffers: false β
β // - Convert C++ JValue to JS objects β
β // Emit 'data' events β
β } β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β .on('data')
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Code β
β β
β parser.on('data', (obj) => { β
β // Process parsed object β
β }); β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Characteristics:
- β
Direct kernel access (no Node.js stream layer)
- β
Background I/O thread (doesn't block main thread)
- β
Zero-copy buffers (efficient data transfer)
- β
Resilient under load (only 1.5x slower at 90% CPU)
- β
Works with file descriptors (files, stdin, sockets)
- β Requires native addon build
- β Only works with file descriptors (not all streams)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Main Thread β
β β
β const fd = fs.openSync('/path/to/file', 'r'); β
β const parser = createJsonParserWorkerFromFd(fd); β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Worker Thread Manager β β
β β β β
β β 1. Spawns worker thread β β
β β 2. Passes fd to worker β β
β β 3. Sets up postMessage handlers β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β β fd β
β βΌ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Worker Thread (Separate V8 Isolate) β
β β
β // json-parser-worker-thread.ts β
β while (true) { β
β // Read from fd using fs.readSync() β
β const n = fs.readSync(fd, buf, {...}); β
β β
β // Split by delimiter β
β // Parse JSON in worker thread β
β const parsed = JSON.parse(candidate); β
β β
β // Add to batch (complete POJSOs) β
β batch.push(parsed); β
β β
β // Send to main thread via postMessage β
β parentPort!.postMessage({ β
β type: 'data', β
β batch: batch // Structured cloning β
β }); β
β } β
β β
β Data path: Kernel β Worker β Structured Clone β Main β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β postMessage() (structured cloning)
β Full object graph serialization
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Main Thread (Message Handler) β
β β
β worker.on('message', (msg) => { β
β if (msg.type === 'data') { β
β // Objects arrive fully parsed β
β // Structured cloning reconstructs object graph β
β this.pending.push(...msg.batch); β
β // Emit 'data' events β
β } β
β }); β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
β .on('data')
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Code β
β β
β parser.on('data', (obj) => { β
β // Process parsed object β
β }); β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Characteristics:
- β
Pure JavaScript (no native addon)
- β
Parsing offloaded to worker thread
- β
Complete objects arrive (no re-parsing needed)
- β Structured cloning overhead (serialization/deserialization)
- β Slower than native parser (especially for nested objects)
- β More memory overhead (object graph copying)
Main Thread (Node.js) Background Thread (C++)
βββββββββββββββββββββ ββββββββββββββββββββββββ
Create parser βββ
β β
ββ> dup(fd) β
ββ> Start std::thread β
β β
β ββ> read(fd_dup, buf)
β ββ> Process data
β ββ> Prepare batch
β β
β<ββ TSFN callback βββββββββββββ
β (zero-copy buffers)
β
ββ> JSON.parse() (if passRawBuffers)
ββ> Emit 'data' events
β
ββ> Application receives objects
Main Thread (Node.js) Worker Thread (V8 Isolate)
βββββββββββββββββββββ βββββββββββββββββββββββββββ
Create parser βββ
β β
ββ> Spawn worker_threads β
β β
β ββ> fs.readSync(fd, buf)
β ββ> JSON.parse()
β ββ> Create POJSOs
β β
β<ββ postMessage() βββββββββββββ
β (structured cloning)
β
ββ> Receive objects (already parsed)
ββ> Emit 'data' events
β
ββ> Application receives objects
| Parser | Time (5K objects) | Throughput | Notes |
|---|---|---|---|
| JSONParser | ~16ms | 312,500 obj/sec | Fastest - no thread overhead |
| Native-optimized | ~21ms | 238,000 obj/sec | Thread overhead, but zero-copy |
| Worker | ~31ms | 161,000 obj/sec | Structured cloning overhead |
| Parser | Slowdown | Notes |
|---|---|---|
| JSONParser | ~2-3x | Main thread blocked by parsing |
| Native-optimized | ~1.37x | Background I/O helps |
| Worker | ~1.2-1.5x | Parsing offloaded to worker |
| Parser | Slowdown | Notes |
|---|---|---|
| JSONParser | ~4-5x | Severe degradation |
| Native-optimized | ~1.51x | Resilient under load |
| Worker | ~1.3-1.8x | Good but structured cloning overhead |
// C++ side: Allocate buffer
item.external_data = std::make_unique<uint8_t[]>(size);
std::memcpy(item.external_data.get(), data, size);
// Create external buffer (zero-copy)
napi_create_external_buffer(env, size, item.external_data.get(),
nullptr, nullptr, &buffer);
// Buffer lifetime:
// - Owned by unique_ptr in ParsedItem
// - ParsedItem lives in BatchMsg
// - BatchMsg deleted after JS callback processes it
// - Safe: TSFN callbacks execute synchronously// Worker thread: Create object
const parsed = JSON.parse(json); // POJSO created
// Main thread: Receive via structured cloning
// - Entire object graph is serialized
// - Transferred across thread boundary
// - Deserialized on main thread
// - New objects created (memory copied)- Performance: Eliminates memory copying overhead
- Efficiency: V8's
JSON.parse()is highly optimized - Scalability: Better for high-throughput scenarios
- Non-blocking: Main thread stays responsive
- Throughput: Can read large buffers efficiently
- Resilience: Performance maintained under load
- Efficiency: No serialization overhead
- Zero-copy: Direct memory transfer
- Lower latency: Synchronous callbacks
- passRawBuffers: true (default): Best performance, uses V8's optimized JSON.parse()
- passRawBuffers: false: Useful for debugging, C++ JSON parsing for comparison
// Duplicate fd so we can:
// 1. Close it independently to break blocking read
// 2. Read from background thread safely
inst->fd_dup = dup(inst->fd);
// Background thread uses fd_dup
read(inst->fd_dup, buf, size);
// Main thread can close original fd
// Background thread continues with fd_dup// Handle non-blocking FDs gracefully
if (errno == EAGAIN || errno == EWOULDBLOCK) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
continue;
}
// Handle interrupts
if (errno == EINTR) {
continue;
}- Reduced overhead: Fewer callbacks = better performance
- Better throughput: Process multiple items at once
- Lower latency: Amortize callback cost
- Small batches (64-128): Lower latency, more callbacks
- Large batches (2048+): Better throughput, higher latency
- Default (2048): Good balance for most use cases
Important: The location of JSON parsing depends on the mode:
- JSON Parsing: JS main thread using V8's
JSON.parse() - I/O: C++ background thread
- Why: V8's JSON.parse() is highly optimized, even on main thread
- JSON Parsing: C++ background thread using C++ JSON parser
- I/O: C++ background thread
- Why: Useful for debugging, but slower than V8's parser
See JSON Parsing Location for detailed explanation.
The architecture is designed to:
- Minimize data copying - Zero-copy buffers where possible
- Offload I/O - Background threads for file operations
- Maintain responsiveness - Non-blocking main thread
- Scale under load - Resilient performance characteristics
- Support multiple use cases - Streams, FDs, different performance needs
Each parser implementation is optimized for its specific use case, providing the best performance characteristics for different scenarios.