Skip to content

perf: remove path validation delay#43

Merged
trsdn merged 3 commits into
mainfrom
perf/remove-path-validation-delay
Jun 10, 2026
Merged

perf: remove path validation delay#43
trsdn merged 3 commits into
mainfrom
perf/remove-path-validation-delay

Conversation

@trsdn

@trsdn trsdn commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Removes the timing-normalization wrapper from validate_and_sanitize_path(). The wrapper added a minimum ~50ms delay to every path validation even though this code path does not compare secrets or perform authentication checks.

Measured locally:

  • Before: 20 validations in 1.070s (~53.5ms/call)
  • After: 20 validations in 0.000826s (~0.041ms/call)

This improves convert_file latency and any workflow that validates many paths while preserving the existing path traversal, directory allow-list, dangerous path, and extension checks.

10 performance improvements identified

  1. Remove artificial path-validation delay. Highest impact / lowest effort; implemented in this PR.
  2. Lower validation-path logging from info to debug to reduce repeated conversion overhead and noisy stderr during batch operations.
  3. Reuse a resolved safe-directory list instead of resolving allowed directories on every validation call.
  4. Parallelize convert_directory conversions with bounded concurrency instead of processing supported files serially.
  5. Avoid dispatching simple output writes through the executor in convert_directory; write synchronously after conversion or batch writes.
  6. Add a direct fast path for plain text / markdown files to avoid MarkItDown overhead when no transformation is needed.
  7. Cache the supported-formats response string instead of rebuilding it for each list_supported_formats call.
  8. Stream or chunk output truncation for very large conversion results rather than holding full text before truncating.
  9. Replace full JSON read() + json.loads() validation with a streaming/depth-limited parser to reduce memory for large JSON files.
  10. Replace XML full-file regex validation with bounded pre-scan/streaming validation to reduce memory and regex cost on large XML files.

Validation

  • ruff check markitdown_mcp/server.py
  • pytest tests/unit/test_convert_file_tool.py tests/unit/test_convert_directory_tool.py tests/security/test_path_traversal.py --quiet — 62 passed
  • pytest -m 'not performance and not slow and not security' --quiet — 190 passed, 2 skipped, 90 deselected
  • pytest tests/security/test_path_traversal.py --quiet — 18 passed

Note: full pytest --quiet currently fails in unrelated existing performance/security tests: two memory-threshold tests in tests/performance/test_memory_usage.py and two JSON bomb generation failures in tests/security/test_malicious_files.py on Python 3.12 before the server handles the files.

Path validation was wrapped in a timing-normalization decorator that added at least 50ms to every file-path validation. That delay is unnecessary for path validation because no secret-dependent comparison happens there, and it materially slows convert_file calls and any workflow that validates many paths.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

trsdn and others added 2 commits June 10, 2026 12:55
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 PR Analysis Results

PR: #43 | Commit: 4a0709ea34506ef54feed53fb62a591d7b78b1e7

🎨 Code Formatting

All files properly formatted

🔧 Code Linting

No linting issues found

📝 Type Checking

Type checking issues found

Click to see type issues
usage: mypy [-h] [-v] [-V] [more options; see below]
            [-m MODULE] [-p PACKAGE] [-c PROGRAM_TEXT] [files ...]
mypy: error: unrecognized arguments: --json-report mypy_report.json

Fix: Add proper type annotations and resolve type errors

🔒 Security Analysis

No security issues detected

📊 Test Coverage Analysis

Coverage 82.17636022514071% meets 80% requirement

🧹 Dead Code Analysis

Dead code analysis completed

📋 Summary

⚠️ Found 1 issue(s) that should be addressed:

  • 📝 Types: Issues found

🔧 Quick Fix Commands:

# Fix formatting and auto-fixable linting issues
ruff format .
ruff check . --fix

# Run tests with coverage
pytest tests/unit/ --cov=markitdown_mcp --cov-report=term-missing

# Check security
bandit -r markitdown_mcp/

This analysis was automatically generated by the PR feedback workflow.
Report generated at 2026-06-10 11:01:35 UTC

@github-actions

Copy link
Copy Markdown
Contributor

🔍 CI Quality Gates Summary

Overall Status: ✅ All Passed

Check Status Details Action Required
🎨 Format ✅ Passed ruff format check None
🔧 Lint ✅ Passed ruff linting None
📝 Types ✅ Passed mypy type checking None
🧪 Tests ✅ Passed Unit tests None
📊 Coverage 82.2% Minimum: 80% None
🔌 MCP ✅ Valid Protocol compliance None
🔒 Security ✅ Clean Dependency audit None

🔗 Quick Links

🛠️ Quick Fix Commands

# Fix most issues automatically
ruff format .
ruff check . --fix

# Run tests locally
pytest tests/unit/ --cov=markitdown_mcp

# Check types
mypy markitdown_mcp

Last updated: 2026-06-10 11:02:26 UTC

@github-actions

Copy link
Copy Markdown
Contributor

🔍 PR Quality Summary

CI Status

✅ Security: success
✅ Docs: success
✅ Tests: success
✅ Quality: success

Metrics

Metric Value Trend
📊 Coverage N/A -
🧪 Tests Test results unavailable -
⏱️ Performance No performance data -

Quality Checks

  • Format & Lint: Ruff formatting and linting
  • Type Safety: MyPy strict type checking
  • Security: Bandit, Safety, GitLeaks scanning
  • MCP Protocol: Tool schema validation
  • Documentation: Docstring coverage (80%+)

MCP Tools

  • convert_file - Convert individual files to Markdown
  • convert_directory - Batch convert directories
  • list_supported_formats - Query supported file types

🤖 Auto-generated by CI • Last updated: 2026-06-10 11:04 UTC

@trsdn trsdn merged commit 2dc89d3 into main Jun 10, 2026
61 checks passed
@trsdn trsdn deleted the perf/remove-path-validation-delay branch June 10, 2026 11:05
@trsdn trsdn mentioned this pull request Jun 10, 2026
4 tasks
trsdn pushed a commit that referenced this pull request Jun 10, 2026
## 🚀 Version Bump: v1.2.2

This PR bumps the package version after the merged fixes and performance cleanup.

### 📊 Release Summary
- **Version Type**: patch
- **New Version**: v1.2.2
- **Commits Included**: consolidated fixes from #44 and performance cleanup from #43

### 📝 Changelog Preview
### 🐛 Bug Fixes
- Resolve MCP protocol and file validation issues (#44)

### ⚡ Performance
- Remove artificial path validation delay (#43)

### 🎯 What Happens Next
1. **Review**: Maintainers review this version bump
2. **Merge**: When merged, a git tag `v1.2.2` will be created
3. **Release**: The tag will trigger the automated release workflow
4. **Publish**: Package will be published to PyPI automatically

### ✅ Pre-Release Checklist
- [x] Version number looks correct
- [x] Changelog entries are accurate
- [x] No breaking changes in patch release
- [ ] All CI checks pass

---
*This PR was originally created automatically by the version bump workflow and updated after #43 merged.*
@github-actions github-actions Bot mentioned this pull request Jun 10, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant