Skip to content

Split Python publishing: nlpo3 (base) + nlpo3-deepcut (optional ONNX package)#104

Draft
Copilot wants to merge 1 commit into
mainfrom
copilot/split-nlpo3-packages
Draft

Split Python publishing: nlpo3 (base) + nlpo3-deepcut (optional ONNX package)#104
Copilot wants to merge 1 commit into
mainfrom
copilot/split-nlpo3-packages

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 4, 2026

Bundling the deepcut ONNX model and tract-onnx runtime into every nlpo3 wheel added ~7 MB to an install most users don't need. This splits the Python publication into a lightweight base and an opt-in deepcut package.

New packages

Package Directory Wheel size Contents
nlpo3 nlpo3-python/ ~1.1 MB NewmmTokenizer, NewmmFstTokenizer, deepcut shim
nlpo3-deepcut nlpo3-deepcut-python/ ~8.5 MB DeepcutTokenizer + ONNX model + tract-onnx

Base package (nlpo3)

  • Compiled without deepcut Cargo feature → no tract-onnx, no ndarray, no embedded ONNX bytes
  • DeepcutTokenizer remains importable from nlpo3 as a Python-level forwarding shim:
    • Present: nlpo3_deepcut.DeepcutTokenizer is constructed and returned
    • Absent: raises ImportError with install hint
from nlpo3 import DeepcutTokenizer
tok = DeepcutTokenizer()  # works if nlpo3-deepcut installed; else:
# ImportError: DeepcutTokenizer requires the nlpo3-deepcut package.
# Install it with:  pip install nlpo3-deepcut

New nlpo3-deepcut package

  • nlpo3-deepcut-python/ — full package with Cargo.toml, pyproject.toml, src/lib.rs, Python package, stubs, tests
  • Compiled with deepcut feature (default); embeds model/deepcut.onnx via include_bytes!
  • Declares nlpo3~=2.0 runtime dependency
  • DeepcutTokenizer accessible directly or via base shim — same call site, same results

CI

  • build-python-wheels.yml: adds parallel build_wheels_deepcut / build_sdist_deepcut jobs; publish_pypi uploads both packages
  • test-nlpo3-python.yml: adds test-python-deepcut job; base job no longer needs deepcut available

Docs

  • docs/design.md: "Proposed split packaging" → implemented; updated with actual layout and behavior
  • docs/implementation.md: new "Split Python packaging" section documenting shim mechanics and module names
  • CHANGELOG.md: breaking-change note with migration instructions

Copilot AI assigned Copilot and bact Apr 4, 2026
@bact bact added refactoring Refactoring, reformatting, with no new functionality. infrastructure CI/CD workflow, build, publication, deploy nlpo3-python labels Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure CI/CD workflow, build, publication, deploy nlpo3-python refactoring Refactoring, reformatting, with no new functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants