Skip to content

Add FLOAT16 input/output format support for pure FP16 models#3

Merged
ChinChangYang merged 1 commit intomasterfrom
feature/float16-io
Jan 18, 2026
Merged

Add FLOAT16 input/output format support for pure FP16 models#3
ChinChangYang merged 1 commit intomasterfrom
feature/float16-io

Conversation

@ChinChangYang
Copy link
Owner

Implements pure FP16 mode where model inputs and outputs use FLOAT16 data types instead of FLOAT32, enabling maximum performance on Apple Silicon while maintaining ~50% weight size reduction.

Key changes:

  • Add use_fp16_io option to ConversionOptions (requires FP16 compute)
  • Conditionally set input/output tensor types to FLOAT16 in MIL builder
  • Remove cast operations at boundaries when using pure FP16 mode
  • Update model description to use FLOAT16 for inputs/outputs
  • Auto-upgrade specification version to 7 (iOS 16+) for FP16 I/O
  • Add --float16-io CLI flag with validation

Three precision modes now supported:

  1. FLOAT32 (default): FP32 everywhere, iOS 15+
  2. Mixed precision (--float16): FP32 I/O, FP16 compute, iOS 15+
  3. Pure FP16 (--float16 --float16-io): FP16 everywhere, iOS 16+

Testing:

  • All 13 C++ unit tests pass
  • Added 10 new Python integration tests for FP16 I/O functionality
  • Tests verify model compilation, data types, inference, and dynamic batch

Implements pure FP16 mode where model inputs and outputs use FLOAT16
data types instead of FLOAT32, enabling maximum performance on Apple
Silicon while maintaining ~50% weight size reduction.

Key changes:
- Add use_fp16_io option to ConversionOptions (requires FP16 compute)
- Conditionally set input/output tensor types to FLOAT16 in MIL builder
- Remove cast operations at boundaries when using pure FP16 mode
- Update model description to use FLOAT16 for inputs/outputs
- Auto-upgrade specification version to 7 (iOS 16+) for FP16 I/O
- Add --float16-io CLI flag with validation

Three precision modes now supported:
1. FLOAT32 (default): FP32 everywhere, iOS 15+
2. Mixed precision (--float16): FP32 I/O, FP16 compute, iOS 15+
3. Pure FP16 (--float16 --float16-io): FP16 everywhere, iOS 16+

Testing:
- All 13 C++ unit tests pass
- Added 10 new Python integration tests for FP16 I/O functionality
- Tests verify model compilation, data types, inference, and dynamic batch

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ChinChangYang ChinChangYang marked this pull request as ready for review January 18, 2026 11:41
@ChinChangYang ChinChangYang merged commit 63358cc into master Jan 18, 2026
1 check passed
@ChinChangYang ChinChangYang deleted the feature/float16-io branch January 18, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant