fix: use node-easyocr's bundled venv python instead of requiring global pip install#4
Conversation
…al pip install node-easyocr has two bugs that break OCR for most users: 1. Its preinstall script creates a venv with easyocr/torch installed, but the runtime hardcodes `pythonPath = 'python3'`, ignoring the venv entirely. This means OCR only works if easyocr is installed globally, which homebrew python on macOS actively blocks. 2. On first run, easyocr downloads models and prints a progress bar to stdout. node-easyocr's IPC tries to JSON.parse every stdout chunk, so it chokes on "Progress: |...| 45% Complete" and rejects the init promise. The models download successfully in the background, but the cached rejected promise means all subsequent OCR calls in that session also fail. Fix 1: Resolve node-easyocr's package location at runtime via createRequire and point pythonPath at its bundled venv/bin/python. This works regardless of where node_modules lives (local, global, or npx cache under ~/.npm/_npx/). Uses platform-aware paths (venv/bin/python on POSIX, venv/Scripts/python.exe on Windows), matching the same convention in node-easyocr's own setup-python-env.js. Fix 2: Before calling node-easyocr's init, pre-download models via a separate subprocess with verbose=False. This runs with the full set of configured languages (not just English) so all needed recognition models are cached before node-easyocr's init runs. The subprocess exit code is checked so download failures surface clearly instead of falling through to a misleading error. Also clears easyOCRInitPromise on failure so transient errors don't poison the process for the rest of the session, and updates error messages and README to reflect automatic setup (no more "pip install easyocr" instructions).
56c1aa9 to
b9d43f3
Compare
|
@cce Hi, thanks for the PR, I'll look into it tomorrow. |
igorzheludkov
left a comment
There was a problem hiding this comment.
Thanks for the excellent PR! The root cause analysis is thorough, and the fix works correctly — I verified it end-to-end locally.
Two small improvements I'd suggest before merging:
1. Add a timeout to the model pre-download subprocess
The spawn call to pre-download models has no timeout. If the network stalls or Python hangs, it blocks forever. The existing withTimeout helper is already used for ocr.init() — same pattern should wrap this subprocess:
await withTimeout(new Promise<void>((resolve, reject) => {
const proc = spawn(pythonPath, [
"-c", `import easyocr; easyocr.Reader(${langArg}, verbose=False)`
]);
// ... handlers ...
}), 120000, "EasyOCR model download timeout — check your network connection");Using 120s since first-run model downloads can be ~100MB+.
2. Capture stderr from the subprocess
Currently if the pre-download fails, the only diagnostic is "exited with code ${code}". Capturing stderr gives much more actionable error messages:
let stderr = "";
proc.stderr.on("data", (d: Buffer) => { stderr += d; });
proc.on("close", (code) => {
if (code === 0) resolve();
else reject(new Error(`EasyOCR model setup failed (code ${code}): ${stderr.trim() || "unknown error"}. Ensure Python 3.6+ is installed.`));
});Both changes are small and would make the error handling more robust. Happy to discuss if you have any questions!
|
Thanks for the review @igorzheludkov, added the changes to the PR! |
igorzheludkov
left a comment
There was a problem hiding this comment.
Looks great, both suggestions addressed cleanly. Thanks for the quick update!
I really think this is a great MCP server, but don't want to install global Python packages using pip — so I was poking around and noticed that node-easyocr's preinstall script already creates an >800MB venv with easyocr/torch installed, containing easyocr, pytorch, cv2, etc hidden inside
~/.npm/_npx/when this MCP is run with npx.However node-easyocr's runtime hardcodes
pythonPath = 'python3', a bug that seems to ignore the venv entirely. This means OCR only works if easyocr is installed globally, and requires a second install of all these depdencies.We can use the venv it makes (without a PR to fix node-easyocr) by looking up
dirname(require.resolve("node-easyocr"))and setting theEasyOCRinstance'spythonPathto the venv-specific Python... however... on first run, easyocr downloads models and prints a progress bar to stdout becauseverbose=Trueis being used by node-easyocr's Python wrapper, which breaks the JSON expectation.This PR contains two fixes to work around this issue by (1) looking up the venv node-easyocr has already made, to save disk space, and (2) running easyocr inside the venv once before calling node-easyocr, so that if the models haven't been downloaded yet, they will be ready when the call to node-easyocr is made.
This is all to work around https://github.com/techbyvj/node-easyocr, and it would probably be better to contribute this upstream (I've just done so in techbyvj/node-easyocr#3). An alternative is just to vendor node-easyocr's functionality and wrap Python's easyocr lib directly in this MCP and get rid of the dependency.. but that seemed like a bigger lift. Anyway, this change is working for me so I figured I would submit it back to you!
Thanks again for this super useful MCP!