fix: apply "model." prefix only for lm models and fix UTF-8 decoding of vocab.json/merges.txt#32
Conversation
…of vocab.json/merges.txt
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
convert.py (1)
114-149: Consider using a context manager for file handle safety.The file opened at line 114 is closed manually at line 149. If an exception occurs during tensor processing (e.g., in
np.frombufferorw.add_tensor), the file handle will leak. While this is pre-existing code and the impact is limited for a conversion script, awithstatement would ensure cleanup.♻️ Proposed refactor
- f = open(sf_path, "rb") - count = 0 - total = 0 - - for name in names: + with open(sf_path, "rb") as f: + count = 0 + total = 0 + + for name in names: # ... (indent loop body) - - f.close() - return count, total + return count, total🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@convert.py` around lines 114 - 149, The code opens the snapshot file via f = open(sf_path, "rb") and closes it manually at the end, which can leak the file handle if an exception occurs during the loop (e.g., in np.frombuffer or w.add_tensor); wrap the file access in a context manager (use with open(sf_path, "rb") as f:) and keep the existing f.seek(...) and f.read(...) logic inside that block so the file is guaranteed to be closed on error; update references to f, sf_path, names, meta, BF16, np.frombuffer and w.add_tensor accordingly and remove the manual f.close() call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@convert.py`:
- Around line 114-149: The code opens the snapshot file via f = open(sf_path,
"rb") and closes it manually at the end, which can leak the file handle if an
exception occurs during the loop (e.g., in np.frombuffer or w.add_tensor); wrap
the file access in a context manager (use with open(sf_path, "rb") as f:) and
keep the existing f.seek(...) and f.read(...) logic inside that block so the
file is guaranteed to be closed on error; update references to f, sf_path,
names, meta, BF16, np.frombuffer and w.add_tensor accordingly and remove the
manual f.close() call.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
convert.py (1)
118-149:⚠️ Potential issue | 🔴 CriticalFix loop indentation: current code processes only the final tensor.
The tensor processing logic (lines 121–147) is dedented outside the for-loop. Only the
info = meta[name]assignment at line 119 executes inside the loop; lines 121–147 run once after the loop finishes using only the last tensor's name and info. This drops all tensors except the final one in the sorted names list. The return at line 149 executes after processing that single tensor.Indent lines 121–147 by four additional spaces to move them into the for-loop body, and move the return statement outside the with-block (after line 147).
Proposed fix
def add_tensors_from_sf(w, sf_path, tag, model_type): meta, hdr_size = read_sf_header(sf_path) names = sorted(meta.keys()) with open(sf_path, "rb") as f: count = 0 total = 0 for name in names: info = meta[name] - # normalize: some upstream checkpoints omit the "model." prefix - if model_type == "lm" and not name.startswith("model."): - name = "model." + name - - dtype_str = info["dtype"] - shape = info["shape"] - off0, off1 = info["data_offsets"] - nbytes = off1 - off0 - - f.seek(hdr_size + off0) - raw = f.read(nbytes) - - if dtype_str == "BF16": - arr = np.frombuffer(raw, dtype=np.uint16).reshape(shape) - w.add_tensor(name, arr, raw_dtype=BF16) - elif dtype_str == "F16": - arr = np.frombuffer(raw, dtype=np.float16).reshape(shape) - w.add_tensor(name, arr) - elif dtype_str == "F32": - arr = np.frombuffer(raw, dtype=np.float32).reshape(shape) - w.add_tensor(name, arr) - else: - log(tag, " skip %s: dtype %s" % (name, dtype_str)) - continue - - count += 1 - total += nbytes - - return count, total + # normalize: some upstream checkpoints omit the "model." prefix + if model_type == "lm" and not name.startswith("model."): + name = "model." + name + + dtype_str = info["dtype"] + shape = info["shape"] + off0, off1 = info["data_offsets"] + nbytes = off1 - off0 + + f.seek(hdr_size + off0) + raw = f.read(nbytes) + + if dtype_str == "BF16": + arr = np.frombuffer(raw, dtype=np.uint16).reshape(shape) + w.add_tensor(name, arr, raw_dtype=BF16) + elif dtype_str == "F16": + arr = np.frombuffer(raw, dtype=np.float16).reshape(shape) + w.add_tensor(name, arr) + elif dtype_str == "F32": + arr = np.frombuffer(raw, dtype=np.float32).reshape(shape) + w.add_tensor(name, arr) + else: + log(tag, " skip %s: dtype %s" % (name, dtype_str)) + continue + + count += 1 + total += nbytes + + return count, total🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@convert.py` around lines 118 - 149, The loop body is mis-indented so only info = meta[name] runs per-item; move the tensor processing block (everything that reads dtype_str/shape/off0/off1, seeks and reads raw, constructs arr for BF16/F16/F32, calls w.add_tensor(name,...), updates count and total) inside the for name in names: loop so each tensor is processed, and ensure the return count, total is placed after the with-block (outside the loop); check symbols: names, meta, model_type, name.startswith("model."), dtype_str = info["dtype"], f.seek(...), w.add_tensor(...), count and total for correct updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@convert.py`:
- Around line 118-149: The loop body is mis-indented so only info = meta[name]
runs per-item; move the tensor processing block (everything that reads
dtype_str/shape/off0/off1, seeks and reads raw, constructs arr for BF16/F16/F32,
calls w.add_tensor(name,...), updates count and total) inside the for name in
names: loop so each tensor is processed, and ensure the return count, total is
placed after the with-block (outside the loop); check symbols: names, meta,
model_type, name.startswith("model."), dtype_str = info["dtype"], f.seek(...),
w.add_tensor(...), count and total for correct updates.
…odel prefix normalization in add_tensors_from_sf
There was a problem hiding this comment.
🧹 Nitpick comments (2)
convert.py (2)
38-38: Consider using context managers for consistency.This file handle isn't explicitly closed. While CPython's garbage collector will handle it, using a context manager would be consistent with the improvements made elsewhere in this PR (lines 114, 173, 180).
♻️ Suggested fix
- idx = json.load(open(index)) + with open(index, "r", encoding="utf-8") as f: + idx = json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@convert.py` at line 38, Replace the json.load(open(index)) call with a context-managed file open to ensure the file is closed properly: locate the assignment to idx that uses json.load(open(index)) and change it to open the file with a with statement (with open(index, 'r') as f:) and call json.load(f) inside that block so the file handle is deterministically closed, matching the context-manager usage elsewhere in this file.
205-205: Same pattern: consider using a context manager for consistency.♻️ Suggested fix
- cfg = json.load(open(cfg_path)) + with open(cfg_path, "r", encoding="utf-8") as f: + cfg = json.load(f)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@convert.py` at line 205, Replace the direct open(...) usage when loading JSON to use a context manager: instead of calling json.load(open(cfg_path)) update the code around cfg and cfg_path to open the file with a with statement and pass the file object to json.load (ensuring the file is closed automatically); locate the occurrence of json.load(open(cfg_path)) in convert.py and change it to use a with-open context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@convert.py`:
- Line 38: Replace the json.load(open(index)) call with a context-managed file
open to ensure the file is closed properly: locate the assignment to idx that
uses json.load(open(index)) and change it to open the file with a with statement
(with open(index, 'r') as f:) and call json.load(f) inside that block so the
file handle is deterministically closed, matching the context-manager usage
elsewhere in this file.
- Line 205: Replace the direct open(...) usage when loading JSON to use a
context manager: instead of calling json.load(open(cfg_path)) update the code
around cfg and cfg_path to open the file with a with statement and pass the file
object to json.load (ensuring the file is closed automatically); locate the
occurrence of json.load(open(cfg_path)) in convert.py and change it to use a
with-open context.
|
Thanks! Commit 1fa5cf4 blindly added "model." to all tensors to fix the 1.7B LM. |
This PR adjusts the ACE‑Step Safetensors → GGUF converter to:
"model."prefix normalization of tensor names only forlm‑type models (acestep-lm), leaving DiT, text encoder, and VAE checkpoints unchanged.UnicodeDecodeErrorwhen readingvocab.jsonandmerges.txton Windows by explicitly opening them with UTF‑8 encoding instead of relying on the system default (cp1251).These changes allow the script to correctly handle ACE‑Step LM checkpoints (including
acestep-5Hz-lm-*) while remaining compatible with other ACE‑Step model types.Summary by CodeRabbit