Skip to content

fix: use node-easyocr's bundled venv python instead of requiring global pip install#4

Merged
igorzheludkov merged 2 commits intoigorzheludkov:mainfrom
cce:use-node-easyocr-venv
Mar 6, 2026
Merged

fix: use node-easyocr's bundled venv python instead of requiring global pip install#4
igorzheludkov merged 2 commits intoigorzheludkov:mainfrom
cce:use-node-easyocr-venv

Conversation

@cce
Copy link
Contributor

@cce cce commented Feb 25, 2026

I really think this is a great MCP server, but don't want to install global Python packages using pip — so I was poking around and noticed that node-easyocr's preinstall script already creates an >800MB venv with easyocr/torch installed, containing easyocr, pytorch, cv2, etc hidden inside ~/.npm/_npx/ when this MCP is run with npx.

~$ du -sh .npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages
866M	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages
~$ du -sm .npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/* | sort -n  | tail -n10
13	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/pip
15	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/PIL
16	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/easyocr
18	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/networkx
29	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/skimage
33	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/numpy
76	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/sympy
98	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/scipy
119	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/cv2
402	.npm/_npx/a912162c7e77d339/node_modules/node-easyocr/venv/lib/python3.14/site-packages/torch

However node-easyocr's runtime hardcodes pythonPath = 'python3', a bug that seems to ignore the venv entirely. This means OCR only works if easyocr is installed globally, and requires a second install of all these depdencies.

We can use the venv it makes (without a PR to fix node-easyocr) by looking up dirname(require.resolve("node-easyocr")) and setting the EasyOCR instance's pythonPath to the venv-specific Python... however... on first run, easyocr downloads models and prints a progress bar to stdout because verbose=True is being used by node-easyocr's Python wrapper, which breaks the JSON expectation.

This PR contains two fixes to work around this issue by (1) looking up the venv node-easyocr has already made, to save disk space, and (2) running easyocr inside the venv once before calling node-easyocr, so that if the models haven't been downloaded yet, they will be ready when the call to node-easyocr is made.

This is all to work around https://github.com/techbyvj/node-easyocr, and it would probably be better to contribute this upstream (I've just done so in techbyvj/node-easyocr#3). An alternative is just to vendor node-easyocr's functionality and wrap Python's easyocr lib directly in this MCP and get rid of the dependency.. but that seemed like a bigger lift. Anyway, this change is working for me so I figured I would submit it back to you!

Thanks again for this super useful MCP!

…al pip install

node-easyocr has two bugs that break OCR for most users:

1. Its preinstall script creates a venv with easyocr/torch installed, but
   the runtime hardcodes `pythonPath = 'python3'`, ignoring the venv
   entirely. This means OCR only works if easyocr is installed globally,
   which homebrew python on macOS actively blocks.

2. On first run, easyocr downloads models and prints a progress bar to
   stdout. node-easyocr's IPC tries to JSON.parse every stdout chunk,
   so it chokes on "Progress: |...| 45% Complete" and rejects the init
   promise. The models download successfully in the background, but the
   cached rejected promise means all subsequent OCR calls in that session
   also fail.

Fix 1: Resolve node-easyocr's package location at runtime via
createRequire and point pythonPath at its bundled venv/bin/python.
This works regardless of where node_modules lives (local, global,
or npx cache under ~/.npm/_npx/). Uses platform-aware paths
(venv/bin/python on POSIX, venv/Scripts/python.exe on Windows),
matching the same convention in node-easyocr's own setup-python-env.js.

Fix 2: Before calling node-easyocr's init, pre-download models via a
separate subprocess with verbose=False. This runs with the full set of
configured languages (not just English) so all needed recognition models
are cached before node-easyocr's init runs. The subprocess exit code is
checked so download failures surface clearly instead of falling through
to a misleading error.

Also clears easyOCRInitPromise on failure so transient errors don't
poison the process for the rest of the session, and updates error
messages and README to reflect automatic setup (no more "pip install
easyocr" instructions).
@cce cce force-pushed the use-node-easyocr-venv branch from 56c1aa9 to b9d43f3 Compare February 25, 2026 21:38
@igorzheludkov
Copy link
Owner

@cce Hi, thanks for the PR, I'll look into it tomorrow.

Copy link
Owner

@igorzheludkov igorzheludkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent PR! The root cause analysis is thorough, and the fix works correctly — I verified it end-to-end locally.

Two small improvements I'd suggest before merging:

1. Add a timeout to the model pre-download subprocess

The spawn call to pre-download models has no timeout. If the network stalls or Python hangs, it blocks forever. The existing withTimeout helper is already used for ocr.init() — same pattern should wrap this subprocess:

await withTimeout(new Promise<void>((resolve, reject) => {
    const proc = spawn(pythonPath, [
        "-c", `import easyocr; easyocr.Reader(${langArg}, verbose=False)`
    ]);
    // ... handlers ...
}), 120000, "EasyOCR model download timeout — check your network connection");

Using 120s since first-run model downloads can be ~100MB+.

2. Capture stderr from the subprocess

Currently if the pre-download fails, the only diagnostic is "exited with code ${code}". Capturing stderr gives much more actionable error messages:

let stderr = "";
proc.stderr.on("data", (d: Buffer) => { stderr += d; });
proc.on("close", (code) => {
    if (code === 0) resolve();
    else reject(new Error(`EasyOCR model setup failed (code ${code}): ${stderr.trim() || "unknown error"}. Ensure Python 3.6+ is installed.`));
});

Both changes are small and would make the error handling more robust. Happy to discuss if you have any questions!

@cce
Copy link
Contributor Author

cce commented Mar 6, 2026

Thanks for the review @igorzheludkov, added the changes to the PR!

Copy link
Owner

@igorzheludkov igorzheludkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, both suggestions addressed cleanly. Thanks for the quick update!

@igorzheludkov igorzheludkov merged commit 72a7ab6 into igorzheludkov:main Mar 6, 2026
@cce cce deleted the use-node-easyocr-venv branch March 6, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants