Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 13, 2025

📄 11% (0.11x) speedup for get_characters_loss in spacy/ml/models/multi_task.py

⏱️ Runtime : 4.33 milliseconds 3.89 milliseconds (best of 142 runs)

📝 Explanation and details

The optimization replaces numpy.vstack() with numpy.concatenate() for combining UTF-8 arrays from multiple documents, resulting in an 11% speedup.

Key optimization:

  • Changed array combination method: Instead of using numpy.vstack([doc.to_utf8_array(nr_char=nr_char) for doc in docs]), the optimized version first collects arrays in a list, then uses numpy.concatenate(arrays, axis=0).reshape(-1).

Why this is faster:

  • numpy.vstack() creates an intermediate 2D array that must then be reshaped to 1D, requiring additional memory allocation and copying operations
  • numpy.concatenate() with axis=0 directly combines 1D arrays into a single 1D array, eliminating the intermediate 2D step
  • This reduces both memory overhead and the number of array operations needed

Performance impact by test case:

  • Small batches (1-3 docs): 8-13% speedup - modest but consistent improvement
  • Large batches (100-500 docs): 12-21% speedup - the optimization scales well with more documents
  • Edge cases: Still shows improvement (3-85% depending on the specific case)

Behavioral impact:
The function maintains identical output and interface - this is purely an internal optimization that reduces memory allocations during array construction. The improvement is most pronounced when processing larger batches of documents, making it particularly valuable for training scenarios where character-level losses are computed across many documents simultaneously.

The line profiler shows the critical optimization: the first line (array combination) drops from 41.8% to 31.9% of total execution time, demonstrating the efficiency gain from avoiding unnecessary intermediate array structures.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 25 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime

import numpy as np

imports

import pytest
from spacy.ml.models.multi_task import get_characters_loss

Minimal to_categorical implementation for testing

def to_categorical(y, n_classes):
y = np.array(y, dtype=int)
out = np.zeros((y.size, n_classes), dtype=float)
out[np.arange(y.size), y] = 1.0
return out

Minimal ops mock for testing

class Ops:
def asarray(self, arr, dtype=None):
return np.array(arr, dtype=dtype)

Minimal Doc mock for testing

class Doc:
def init(self, text):
self.text = text
def to_utf8_array(self, nr_char):
# Returns an array of UTF-8 codes for the first nr_char characters, padded with zeros
arr = np.zeros(nr_char, dtype=int)
for i, c in enumerate(self.text[:nr_char]):
arr[i] = ord(c)
return arr

------------------ UNIT TESTS ------------------

1. Basic Test Cases

def test_single_doc_single_char():
# Basic: One doc, one character
ops = Ops()
docs = [Doc("A")]
nr_char = 1
# Target is ord('A')=65, so categorical is [0,...,1,...,0] at index 65
prediction = np.zeros((1, 256 * nr_char), dtype=float)
prediction[0, 65] = 1.0 # perfect prediction
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 69.7μs -> 63.0μs (10.5% faster)

def test_single_doc_multiple_chars():
# Basic: One doc, multiple characters
ops = Ops()
docs = [Doc("AB")]
nr_char = 2
# Target is [65, 66]
prediction = np.zeros((1, 256 * nr_char), dtype=float)
prediction[0, 65] = 1.0
prediction[0, 256+66] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 59.2μs -> 54.2μs (9.13% faster)

def test_multiple_docs_single_char():
# Basic: Multiple docs, single character
ops = Ops()
docs = [Doc("A"), Doc("B")]
nr_char = 1
prediction = np.zeros((2, 256 * nr_char), dtype=float)
prediction[0, 65] = 1.0
prediction[1, 66] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 59.9μs -> 54.5μs (9.91% faster)

def test_imperfect_prediction():
# Basic: Prediction not matching target
ops = Ops()
docs = [Doc("A")]
nr_char = 1
prediction = np.zeros((1, 256 * nr_char), dtype=float)
prediction[0, 64] = 1.0 # wrong index
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 56.7μs -> 51.5μs (10.0% faster)

2. Edge Test Cases

def test_empty_doc():
# Edge: Empty doc, nr_char > 0
ops = Ops()
docs = [Doc("")]
nr_char = 3
prediction = np.zeros((1, 256 * nr_char), dtype=float)
# Target should be all zeros (ord(0) = 0)
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 57.5μs -> 52.7μs (9.08% faster)
# Target is one-hot at 0 for each char
target = np.zeros((1, 256 * nr_char), dtype=float)
target[0, 0] = 1.0
target[0, 256] = 1.0
target[0, 512] = 1.0
expected_loss = (prediction - target)**2

def test_doc_shorter_than_nr_char():
# Edge: Doc shorter than nr_char, should pad with zeros
ops = Ops()
docs = [Doc("A")]
nr_char = 3
prediction = np.zeros((1, 256 * nr_char), dtype=float)
# Target: [65, 0, 0]
target = np.zeros((1, 256 * nr_char), dtype=float)
target[0, 65] = 1.0
target[0, 256] = 1.0
target[0, 512] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 56.7μs -> 52.1μs (8.90% faster)
expected_loss = (prediction - target)**2

def test_multiple_docs_varied_length():
# Edge: Multiple docs, varied length, nr_char > doc length
ops = Ops()
docs = [Doc("A"), Doc("BC")]
nr_char = 3
prediction = np.zeros((2, 256 * nr_char), dtype=float)
# For doc "A": [65,0,0]; for "BC": [66,67,0]
prediction[0, 65] = 1.0
prediction[0, 256] = 1.0
prediction[0, 512] = 1.0
prediction[1, 66] = 1.0
prediction[1, 256+67] = 1.0
prediction[1, 512] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 82.4μs -> 76.1μs (8.26% faster)

3. Large Scale Test Cases

def test_large_batch():
# Large scale: Many docs, single char
ops = Ops()
nr_char = 1
docs = [Doc(chr(65 + i % 26)) for i in range(500)] # 500 docs, cycling through A-Z
prediction = np.zeros((500, 256 * nr_char), dtype=float)
for i, doc in enumerate(docs):
prediction[i, ord(doc.text[0])] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 864μs -> 714μs (21.0% faster)

def test_large_batch_multi_char():
# Large scale: Many docs, multiple chars
ops = Ops()
nr_char = 3
docs = [Doc("ABC") for _ in range(200)] # 200 docs, all "ABC"
prediction = np.zeros((200, 256 * nr_char), dtype=float)
for i in range(200):
prediction[i, 65] = 1.0
prediction[i, 256+66] = 1.0
prediction[i, 512+67] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 717μs -> 636μs (12.7% faster)

def test_large_batch_imperfect():
# Large scale: Many docs, imperfect predictions
ops = Ops()
nr_char = 2
docs = [Doc("AB") for _ in range(100)]
prediction = np.zeros((100, 256 * nr_char), dtype=float)
# All predictions off by one
for i in range(100):
prediction[i, 64] = 1.0
prediction[i, 256+65] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 284μs -> 241μs (17.7% faster)

def test_large_nr_char():
# Large scale: Large nr_char, single doc
ops = Ops()
nr_char = 100
doc_text = "".join(chr(65 + i % 26) for i in range(nr_char))
docs = [Doc(doc_text)]
prediction = np.zeros((1, 256 * nr_char), dtype=float)
for i, c in enumerate(doc_text):
prediction[0, i*256 + ord(c)] = 1.0
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 125μs -> 120μs (3.70% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import numpy as np

imports

import pytest
from spacy.ml.models.multi_task import get_characters_loss

Helper: to_categorical (copied from thinc, simplified)

def to_categorical(y, n_classes=None):
y = np.asarray(y, dtype="int32")
if n_classes is None:
n_classes = np.max(y) + 1
categorical = np.zeros((y.size, n_classes), dtype="float32")
categorical[np.arange(y.size), y] = 1.0
return categorical

Helper: mock ops object

class MockOps:
def asarray(self, arr, dtype=None):
return np.asarray(arr, dtype=dtype)

Helper: mock Doc object with to_utf8_array

class MockDoc:
def init(self, text):
self.text = text
def to_utf8_array(self, nr_char):
# Pad or truncate to nr_char characters, encode as utf-8 integer codes (0-255)
arr = np.frombuffer(self.text.encode('utf-8')[:nr_char], dtype=np.uint8)
if len(arr) < nr_char:
arr = np.pad(arr, (0, nr_char - len(arr)), constant_values=0)
return arr

------------------- UNIT TESTS -------------------

1. Basic Test Cases

def test_single_doc_exact_chars():
"""Test with a single doc, prediction matches target exactly."""
ops = MockOps()
docs = [MockDoc("A")]
nr_char = 1
# Target: 'A' -> 65, so one-hot at 65
target = to_categorical([65], n_classes=256).reshape(1, 256)
prediction = target.copy()
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 61.7μs -> 56.1μs (9.88% faster)

def test_single_doc_wrong_prediction():
"""Test with a single doc, prediction is all zeros."""
ops = MockOps()
docs = [MockDoc("A")]
nr_char = 1
target = to_categorical([65], n_classes=256).reshape(1, 256)
prediction = np.zeros_like(target)
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 52.5μs -> 47.6μs (10.3% faster)

def test_multiple_docs_basic():
"""Test with multiple docs, simple characters."""
ops = MockOps()
docs = [MockDoc("A"), MockDoc("B")]
nr_char = 1
target_ids = [65, 66]
target = to_categorical(target_ids, n_classes=256).reshape(2, 256)
prediction = target.copy()
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 55.7μs -> 49.0μs (13.8% faster)

def test_multiple_docs_wrong_prediction():
"""Test with multiple docs, prediction is zeros."""
ops = MockOps()
docs = [MockDoc("A"), MockDoc("B")]
nr_char = 1
target_ids = [65, 66]
target = to_categorical(target_ids, n_classes=256).reshape(2, 256)
prediction = np.zeros_like(target)
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 52.7μs -> 47.3μs (11.4% faster)

def test_multi_char_doc():
"""Test with a doc longer than one character."""
ops = MockOps()
docs = [MockDoc("AB")]
nr_char = 2
# 'A' = 65, 'B' = 66
target_ids = [65, 66]
target = to_categorical(target_ids, n_classes=256).reshape(1, 512)
prediction = target.copy()
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 51.9μs -> 46.3μs (12.1% faster)

2. Edge Test Cases

def test_empty_doc():
"""Test with an empty doc (should pad with zeros)."""
ops = MockOps()
docs = [MockDoc("")]
nr_char = 3
# Should pad with three zeros (0)
target_ids = [0, 0, 0]
target = to_categorical(target_ids, n_classes=256).reshape(1, 768)
# Prediction is zeros
prediction = np.zeros_like(target)
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 87.7μs -> 84.4μs (3.95% faster)

def test_doc_shorter_than_nr_char():
"""Test with doc shorter than nr_char (should pad with zeros)."""
ops = MockOps()
docs = [MockDoc("A")]
nr_char = 3
# 'A' = 65, then two zeros
target_ids = [65, 0, 0]
target = to_categorical(target_ids, n_classes=256).reshape(1, 768)
prediction = np.zeros_like(target)
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 82.3μs -> 77.8μs (5.79% faster)

def test_empty_docs_list():
"""Test with empty docs list."""
ops = MockOps()
docs = []
nr_char = 2
# Should raise error or handle gracefully
prediction = np.zeros((0, 512))
with pytest.raises(ValueError):
get_characters_loss(ops, docs, prediction, nr_char) # 8.09μs -> 4.37μs (85.3% faster)

def test_prediction_wrong_shape():
"""Test with prediction shape not matching target."""
ops = MockOps()
docs = [MockDoc("A")]
nr_char = 2
# Target shape: (1, 512)
prediction = np.zeros((1, 256)) # Wrong shape
with pytest.raises(ValueError):
# Should raise due to shape mismatch in subtraction or reshape
get_characters_loss(ops, docs, prediction, nr_char) # 101μs -> 97.4μs (4.06% faster)

3. Large Scale Test Cases

def test_many_docs_and_chars():
"""Test with many docs and many chars per doc."""
ops = MockOps()
num_docs = 100
nr_char = 5
# Each doc is "ABCDE"
docs = [MockDoc("ABCDE") for _ in range(num_docs)]
target_ids = [65, 66, 67, 68, 69] * num_docs
target = to_categorical(target_ids, n_classes=256).reshape(num_docs, 256 * nr_char)
prediction = target.copy()
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 308μs -> 265μs (16.1% faster)

def test_large_nr_char():
"""Test with a single doc and large nr_char."""
ops = MockOps()
nr_char = 999
text = "A" * nr_char
docs = [MockDoc(text)]
target_ids = [65] * nr_char
target = to_categorical(target_ids, n_classes=256).reshape(1, 256 * nr_char)
prediction = target.copy()
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 379μs -> 370μs (2.42% faster)

def test_maximum_docs_and_chars():
"""Test with maximum allowed docs and chars (close to 1000 elements)."""
ops = MockOps()
num_docs = 25
nr_char = 40 # 25*40 = 1000
docs = [MockDoc("B" * nr_char) for _ in range(num_docs)]
target_ids = [66] * (num_docs * nr_char)
target = to_categorical(target_ids, n_classes=256).reshape(num_docs, 256 * nr_char)
prediction = target.copy()
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 402μs -> 390μs (3.11% faster)

def test_large_random_predictions():
"""Test with large random predictions and targets."""
ops = MockOps()
num_docs = 10
nr_char = 50
# Random ascii letters
import random
docs = []
target_ids = []
for _ in range(num_docs):
chars = ''.join(chr(random.randint(32, 126)) for _ in range(nr_char))
docs.append(MockDoc(chars))
target_ids.extend([ord(c) for c in chars])
target = to_categorical(target_ids, n_classes=256).reshape(num_docs, 256 * nr_char)
# Random predictions, but ensure shape matches
rng = np.random.RandomState(42)
prediction = rng.rand(num_docs, 256 * nr_char).astype("f")
loss, d_target = get_characters_loss(ops, docs, prediction, nr_char) # 204μs -> 192μs (6.32% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_characters_loss-mhwrion1 and push.

Codeflash Static Badge

The optimization replaces `numpy.vstack()` with `numpy.concatenate()` for combining UTF-8 arrays from multiple documents, resulting in an **11% speedup**.

**Key optimization:**
- **Changed array combination method**: Instead of using `numpy.vstack([doc.to_utf8_array(nr_char=nr_char) for doc in docs])`, the optimized version first collects arrays in a list, then uses `numpy.concatenate(arrays, axis=0).reshape(-1)`.

**Why this is faster:**
- `numpy.vstack()` creates an intermediate 2D array that must then be reshaped to 1D, requiring additional memory allocation and copying operations
- `numpy.concatenate()` with `axis=0` directly combines 1D arrays into a single 1D array, eliminating the intermediate 2D step
- This reduces both memory overhead and the number of array operations needed

**Performance impact by test case:**
- **Small batches (1-3 docs)**: 8-13% speedup - modest but consistent improvement
- **Large batches (100-500 docs)**: 12-21% speedup - the optimization scales well with more documents
- **Edge cases**: Still shows improvement (3-85% depending on the specific case)

**Behavioral impact:**
The function maintains identical output and interface - this is purely an internal optimization that reduces memory allocations during array construction. The improvement is most pronounced when processing larger batches of documents, making it particularly valuable for training scenarios where character-level losses are computed across many documents simultaneously.

The line profiler shows the critical optimization: the first line (array combination) drops from 41.8% to 31.9% of total execution time, demonstrating the efficiency gain from avoiding unnecessary intermediate array structures.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 13, 2025 01:40
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant