feat(chunked-prefill): add chunked-prefill for multi gpu and mac#429
Open
wasamtc wants to merge 60 commits intoGradientHQ:mainfrom
Open
feat(chunked-prefill): add chunked-prefill for multi gpu and mac#429wasamtc wants to merge 60 commits intoGradientHQ:mainfrom
wasamtc wants to merge 60 commits intoGradientHQ:mainfrom
Conversation
The workflow has been disabled, preventing any actions from being performed.
…se send data use chunked_reqs+forward_reqs
Re-enable scheduled builds with a cron job.
…cong/gpu_chunk_v3
Collaborator
Author
|
you can use chunked-prefill with command |
Collaborator
Author
|
I tested the Time-To-First-Token (TTFT) latency corresponding to different chunked-prefill sizes, using a Mac Mini (64GB) and an RTX 4090 (24GB) as the test nodes, with an input prompt length of 2,000 tokens and the GPT-OSS-20B model. The results are as follows:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📋 PR Title Format
The PR title should follow the format:
Where:
typeis one of:feat,fix,docs,refactor,perf,test,chore.scopeis optional and describes the part of the codebase affected (e.g.,auth,ui,api).concise messageis a short description of the change (max 50 chars).📝 Change Type
Please select the type of change this PR introduces (choose one or more):
💡 Description
Briefly describe the change, its purpose, and the problem it solves.
Key Changes
🔗 Related Issues
List any issues this PR closes or relates to:
✅ Checklist
Please ensure the following points are addressed before merging: