[OPTIMIZE] Implement lazy hashing cache for Transaction IDs#75
[OPTIMIZE] Implement lazy hashing cache for Transaction IDs#75sanaica wants to merge 16 commits intoStabilityNexus:mainfrom
Conversation
WalkthroughTransaction now caches its computed Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/p2p.py`:
- Around line 210-211: The current top-level message check uses
{"type","data"}.issubset(message) which allows extra unexpected keys; replace it
with a strict equality check (e.g., set(message.keys()) == {"type", "data"}) and
also validate that message is a dict and keys are strings before proceeding to
prevent malformed types; then move the injection of _peer_addr in
_listen_to_peer so it only happens after the strict validation passes (inject
into message['_peer_addr'] after the equality check), ensuring downstream
handlers never see unvalidated extra fields.
- Around line 339-345: Do not mutate the passed-in Block instance when preparing
the broadcast payload: remove the assignment to block.miner and instead build a
broadcast-ready payload dict from the block's canonical data (e.g., use
block.canonical_payload or block.to_dict()), set the miner value on that local
payload dict (or a shallow copy) and then json.dumps/json.loads that payload for
broadcast; update the code that currently sets block.miner and then reads
block.canonical_payload to use the local payload copy so block remains unchanged
for future use.
In `@minichain/transaction.py`:
- Line 22: The cached transaction id self._cached_tx_id can become stale because
only sign() clears it; ensure any mutation of sender, receiver, amount, nonce,
data, timestamp, or signature invalidates the cache by setting
self._cached_tx_id = None. Implement this by adding property setters for those
attributes (or a guarded __setattr__) that assign the new value and then set
self._cached_tx_id = None; update sign() to keep this behavior (it already
clears cache) and remove any other direct assignments that bypass these setters
so cached id is always invalidated when those fields change (affects the
initialization and lines around the existing memoization code at tx_id
getter/memoization).
In `@tests/test_cache.py`:
- Around line 5-42: Replace the diagnostic print statements in test_tx_caching
with concrete assertions: assert initial tx._cached_tx_id is None; after first
access assert first_id equals tx.tx_id and tx._cached_tx_id is not None (and
matches first_id); assert second access returns same id (first_id == second_id);
after tx.sign(sk) assert tx._cached_tx_id is cleared (None); then assert
signed_id != first_id and tx._cached_tx_id equals signed_id after accessing
tx.tx_id again. Use the existing Transaction, tx.tx_id, tx._cached_tx_id and
tx.sign methods and the generated SigningKey to drive these assertions.
In `@tests/test_serialization.py`:
- Around line 34-46: The test reuses the same Transaction instance for both
blocks, weakening cross-instance coverage; create two distinct Transaction
objects (e.g., tx1 and tx2) with identical field values and use tx1 in block1
and tx2 in block2 (referencing the Transaction constructor and block1/block2
creation) so the comparison verifies determinism across separate instances
rather than the same object reused.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 61938287-4ff9-4bcf-8aa8-cad522364831
📒 Files selected for processing (5)
minichain/block.pyminichain/p2p.pyminichain/transaction.pytests/test_cache.pytests/test_serialization.py
3018af1 to
6faa9de
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/transaction.py`:
- Line 22: The timestamp assignment currently uses a truthiness check that
treats 0 as falsy; update the assignment in the Transaction initializer (the
self.timestamp = ... line) to only use the provided timestamp when it is not
None (e.g., check "timestamp is not None") and otherwise set it to
round(time.time() * 1000) so timestamp=0 is preserved.
- Line 9: _TX_FIELDS is defined as a mutable set which can be accidentally
modified and triggers Ruff RUF012; change its definition to an immutable
frozenset (replace the set literal with frozenset({...}) or frozenset([...])) so
the class-level constant is explicit and cannot be mutated, and verify any usage
of _TX_FIELDS treats it as an immutable collection (e.g., membership checks
iterate but do not attempt to add/remove).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 90c87714-822a-40a8-ad23-90d7d78fc273
📒 Files selected for processing (2)
minichain/transaction.pytests/test_cache.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/transaction.py`:
- Around line 57-65: The verify method currently returns True inside the try
block; change it to use a try/except/else pattern so that
VerifyKey(...).verify(...) runs in try, exceptions are caught in except
(returning False), and a single return True is placed in an else block to make
success explicit; update the verify function (referencing VerifyKey, verify,
self.sender, self.hash_payload, self.signature) accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 23305bf2-0968-47e1-b1dc-41eddd9caebb
📒 Files selected for processing (1)
minichain/transaction.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@minichain/transaction.py`:
- Around line 22-23: The constructor is accepting peer timestamps that may be
seconds or milliseconds which breaks to_signing_dict() and tx_id; update the
Transaction initializer (and the other assignment block at lines 37-40) to
normalize inbound timestamps to integer milliseconds: if a provided timestamp is
non-null, coerce to int and, if its value is plausibly in seconds (e.g. < 1e12),
multiply by 1000 before storing to self.timestamp; alternatively raise if you
prefer strictness, but normalizing to ms in the Transaction __init__/from_dict()
path (and ensuring to_signing_dict()/tx_id rely on that) will restore consistent
timestamp units.
- Around line 11-14: Prevent mutation of hash-defining fields after a
transaction is finalized: in __setattr__ keep the existing behavior of clearing
_cached_tx_id when a _TX_FIELDS member changes, but first check a sealed flag
(e.g. self._sealed or attributes set by the signing/publishing code such as
sign() or publish()) and if the transaction is sealed and name is in
self._TX_FIELDS raise an AttributeError (or ignore the write) instead of
allowing the change; also add/ensure the signing/publishing paths (e.g. sign(),
publish(), or the code paths that mark the tx admitted) set self._sealed = True
so subsequent writes to _TX_FIELDS are blocked and the tx_id remains consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a3fdee34-766a-4721-912e-8461ee117a70
📒 Files selected for processing (1)
minichain/transaction.py
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
♻️ Duplicate comments (3)
minichain/transaction.py (3)
31-32:⚠️ Potential issue | 🟠 MajorSigned transactions are reopened as mutable during construction.
Line 32 unconditionally resets
_sealed, soTransaction.from_dict()rebuilds signed transactions as mutable objects. That reintroduces the post-publication mutation problem for anything loaded from disk or the network. Seed_sealedfrom the incoming signature, or set it infrom_dict()after construction.🔒 Minimal fix
self.signature = signature self._cached_tx_id = None - self._sealed = False + self._sealed = bool(self.signature)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/transaction.py` around lines 31 - 32, Constructor currently unconditionally sets self._sealed = False which reopens signed transactions; update the constructor (or Transaction.__init__) to accept and apply a sealed/signature-derived flag (e.g., set self._sealed = bool(signature) or a new sealed parameter) or alternatively set self._sealed appropriately in Transaction.from_dict() immediately after construction using the incoming signature/metadata so that reconstructed signed transactions remain sealed and immutable.
18-21:⚠️ Potential issue | 🟠 MajorAvoid rewriting signed timestamps during deserialization.
This helper changes a hash-defining field before
verify()ortx_idread it. A payload stored or signed withtimestamp=1700000000is reinterpreted as1700000000000, so the signature,tx_id, and any persisted Merkle root no longer describe the original transaction.Transaction.from_dict()on Lines 44-48 andminichain/block.py, Lines 102-106both pass through this path, whileminichain/p2p.py, Lines 112-147only require anint. Reject second-based payloads at the boundary instead of silently rewriting them here, or gate this behind an explicit migration path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/transaction.py` around lines 18 - 21, The _normalize_ts(ts) helper currently rewrites second-based timestamps into milliseconds which mutates fields used to compute signatures and tx_id; change its behavior to reject ambiguous/second-based values instead of silently converting: update _normalize_ts to validate ts is an int and if ts < 1e12 raise a ValueError (or custom TimestampFormatError) with a clear message instructing callers to migrate/convert to milliseconds, and do not alter the value; update callers like Transaction.from_dict and any block deserialization call sites to either perform an explicit, opt-in migration/conversion before calling verify()/tx_id or propagate the error to the boundary that accepts external input so second-based timestamps are handled only at the I/O boundary.
11-16:⚠️ Potential issue | 🟠 MajorMutable
datapayloads still bypass the seal.
__setattr__only catchesself.data = .... Ifdatais a dict/list, in-place edits never hit this hook, and the shallow payloads on Lines 35-41 hand the same object reference back to callers._cached_tx_idcan then stay stale, and_sealedno longer prevents post-signature mutation. Freeze/copydataon write, or hash an immutable snapshot instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@minichain/transaction.py` around lines 11 - 16, The __setattr__ seal currently only prevents assignment to fields but allows in-place mutation of mutable payloads (e.g., the data attribute), so update the logic in __setattr__ and the code paths that expose payloads (the shallow payload getters/returns that currently hand out the same object) to enforce immutability: on writes to any field in _TX_FIELDS (notably "data") replace/marshal the value to an immutable snapshot (e.g., deep copy+freeze or convert lists/dicts to tuples/frozen-mapping) or compute/hash an immutable snapshot for use in the transaction ID, and on reads return an immutable/copy instead of the original mutable object; also ensure that when data is set you always clear _cached_tx_id (use _cached_tx_id attribute) so the cached ID cannot become stale and the _sealed flag truly prevents post-signature mutation via held references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@minichain/transaction.py`:
- Around line 31-32: Constructor currently unconditionally sets self._sealed =
False which reopens signed transactions; update the constructor (or
Transaction.__init__) to accept and apply a sealed/signature-derived flag (e.g.,
set self._sealed = bool(signature) or a new sealed parameter) or alternatively
set self._sealed appropriately in Transaction.from_dict() immediately after
construction using the incoming signature/metadata so that reconstructed signed
transactions remain sealed and immutable.
- Around line 18-21: The _normalize_ts(ts) helper currently rewrites
second-based timestamps into milliseconds which mutates fields used to compute
signatures and tx_id; change its behavior to reject ambiguous/second-based
values instead of silently converting: update _normalize_ts to validate ts is an
int and if ts < 1e12 raise a ValueError (or custom TimestampFormatError) with a
clear message instructing callers to migrate/convert to milliseconds, and do not
alter the value; update callers like Transaction.from_dict and any block
deserialization call sites to either perform an explicit, opt-in
migration/conversion before calling verify()/tx_id or propagate the error to the
boundary that accepts external input so second-based timestamps are handled only
at the I/O boundary.
- Around line 11-16: The __setattr__ seal currently only prevents assignment to
fields but allows in-place mutation of mutable payloads (e.g., the data
attribute), so update the logic in __setattr__ and the code paths that expose
payloads (the shallow payload getters/returns that currently hand out the same
object) to enforce immutability: on writes to any field in _TX_FIELDS (notably
"data") replace/marshal the value to an immutable snapshot (e.g., deep
copy+freeze or convert lists/dicts to tuples/frozen-mapping) or compute/hash an
immutable snapshot for use in the transaction ID, and on reads return an
immutable/copy instead of the original mutable object; also ensure that when
data is set you always clear _cached_tx_id (use _cached_tx_id attribute) so the
cached ID cannot become stale and the _sealed flag truly prevents post-signature
mutation via held references.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 24e5e695-12fd-4e9a-ae53-1eaf43683d6f
📒 Files selected for processing (1)
minichain/transaction.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_cache.py`:
- Around line 13-16: The test currently only checks equality of tx.tx_id values
but not that the expensive hash computation was skipped on repeated access;
update the test to assert the hashing function is called exactly once on first
access and not on the second (and again once after invalidation). Locate the
tx.tx_id property and the internal computation routine (e.g., a method like
_compute_tx_id or the module function that performs hashing) and wrap or
monkeypatch it with a call-counter/mock before reading tx.tx_id the first time,
assert the counter == 1, read tx.tx_id again and assert the counter remains 1,
then trigger the cache invalidation (e.g., by clearing tx._cached_tx_id or
calling tx.invalidate_cache) and read tx.tx_id again asserting the counter
increments to 2.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d460d1bb-aa68-43a5-bbff-5bd1bd09349a
📒 Files selected for processing (1)
tests/test_cache.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_cache.py`:
- Around line 39-48: Update the test_signed_tx_is_sealed test to also assert
cache invalidation and tx_id change when signing: record the pre-sign
tx._cached_tx_id and tx.tx_id (or compute initial tx_id), then call
tx.sign(SigningKey) and assert that tx._cached_tx_id is None (or cleared) and
that tx.tx_id != pre_sign_tx_id; keep the existing sealing assertion
(AttributeError on assignment) and reference the Transaction.sign method, the
_cached_tx_id attribute, and the tx_id property to locate where to add these
checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 90cde7ed-ee91-47ce-bad0-95cdf47bc960
📒 Files selected for processing (1)
tests/test_cache.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_cache.py`:
- Around line 7-37: Add a parameterized loop to test_tx_caching_efficiency that
mutates each field in Transaction._TX_FIELDS (e.g., "amount", "receiver",
"nonce", "data") instead of only changing amount: after the initial access and
memoization check using patch('minichain.transaction.canonical_json_hash') and
mock_hash.call_count, iterate over the fields, change tx.<field> to a new value,
assert tx._cached_tx_id is None, then access tx.tx_id and assert
mock_hash.call_count incremented for each mutation; this ensures cache
invalidation is tested for all unique identifying symbols
(test_tx_caching_efficiency, tx.tx_id, tx._cached_tx_id, _TX_FIELDS, and the
patched canonical_json_hash).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eb82b15c-e139-426f-b5bf-47021d75648b
📒 Files selected for processing (1)
tests/test_cache.py
Description
Currently, the
Transaction.tx_idproperty re-computes a SHA-256 hash every time the identifier is requested. During high-load operations like block validation or full-node syncing, the CPU performs these expensive cryptographic operations many times for the same static data.This PR introduces a caching pattern (memoization) that stores the result of the first hash calculation and reuses it for all subsequent calls. This ensures that accessing a transaction's ID is a near-instant memory lookup rather than a repeated cryptographic operation.
Addressed Issues:
Fixes #74
Screenshots/Recordings:
Verified the logic with a dedicated test script (
tests/test_cache.py).AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: Gemini 3 Flash
Checklist
Summary by CodeRabbit
New Features
Tests