Skip to content

NimaChu/super-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦸‍♂️ Super OCR

The Intelligent OCR Solution That Chooses the Best Engine for You

License OpenClaw Skill

Stop guessing which OCR engine to use. Super OCR automatically selects the optimal engine combination for your specific image, delivering superior accuracy across multiple languages while handling the complexity behind the scenes.

🌟 Why Super OCR?

Traditional OCR tools force you to choose a single engine and stick with it. But what if your document contains Chinese characters mixed with English? Or Japanese text in a complex layout? Super OCR solves this by running multiple engines in parallel and intelligently selecting the best result.

Key Advantages:

  • Automatic Engine Selection: No more manual configuration or guesswork
  • 📈 Higher Accuracy: Achieves up to 98%+ accuracy by leveraging the strengths of multiple engines
  • 🌍 Multi-Language Mastery: Seamlessly handles Chinese, English, Japanese, Korean, Thai, French, and more
  • Smart Performance: Balances speed and accuracy based on your content
  • 🧠 Confidence-Based Selection: Uses weighted scoring to pick the most reliable result

🚀 Quick Start

Installation

pip install paddleocr paddlepaddle pytesseract pillow opencv-python numpy

Basic Usage

# Navigate to skill directory
cd ~/.openclaw/workspace/skills/super-ocr

# Auto mode (recommended) - let Super OCR choose the best approach
python scripts/main.py --image your-document.png --engine all

# Force specific engine (if needed)
python scripts/main.py --image document.png --engine tesseract
python scripts/main.py --image menu.png --engine paddle

# Batch processing
python scripts/main.py --images ./images/*.png --output ./results

🎯 Intelligent Multi-Engine Strategy

Super OCR's core innovation is its parallel processing architecture combined with intelligent result selection:

How It Works:

  1. Parallel Execution: Multiple OCR engines process your image simultaneously
  2. Confidence Scoring: Each engine's output is evaluated with reliability metrics
  3. Weighted Selection: Our intelligent agent analyzes results and selects the optimal output
  4. Language-Aware Optimization: Automatically adapts to the detected language mix

Real-World Performance Benefits:

  • Chinese documents: 95%+ accuracy (vs 63% with single-engine approaches)
  • English documents: Consistent 99-100% accuracy with faster fallback options
  • Mixed-language content: Seamless handling without manual intervention
  • Complex layouts: Better structure preservation through multi-engine consensus

📊 Performance Comparison

Scenario Single Engine Super OCR (Multi-Engine)
Chinese Text 63-85% 95%+
English Text 99-100% 99-100%
Japanese Text 81% 97%+
Mixed Languages Variable Consistently High
Processing Speed Fast Optimized Balance

Based on comprehensive testing across diverse document types and languages

🏗️ Project Structure

super-ocr/
├── scripts/
│   ├── main.py              # Main entry point
│   ├── engine/
│   │   ├── selector.py      # Intelligent engine selection logic
│   │   ├── tesseract.py     # Tesseract engine wrapper
│   │   ├── paddle.py        # PaddleOCR engine wrapper
│   │   └── macvision.py     # MacVision engine (macOS only)
│   └── preprocessing/
│       └── preprocessor.py  # Image preprocessing
├── references/              # Documentation
│   ├── api-reference.md
│   ├── engine-comparison.md
│   └── troubleshooting.md
├── SKILL.md                 # OpenClaw Skill definition
├── _meta.json               # Skill metadata
└── LICENSE                  # MIT License

🛠️ Advanced Usage

Language-Specific Optimization

# For Thai documents (requires Thai language pack)
python scripts/main.py --image thai-document.png --engine all --lang th

Performance Tuning

# Prioritize speed over accuracy
python scripts/main.py --image quick-scan.png --engine tesseract

# Maximum accuracy (slower)
python scripts/main.py --image critical-document.png --engine all

macOS Specific (MacVision)

On macOS, Super OCR leverages the native Vision framework via Swift script for optimal performance:

  • MacVision requires Xcode command line tools
  • Swift script is executed automatically when available

📄 License

MIT License - see LICENSE for details.


Built with ❤️ by Nima AI Team
Super OCR v1.0.1 - Making OCR intelligent, one document at a time.

About

Skill for OpenClaw: Production-grade OCR with intelligent engine selection

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors