Skip to content

TideSQL 4 PATCH (v4.5.4)#143

Merged
guycipher merged 12 commits into
masterfrom
454
Jun 2, 2026
Merged

TideSQL 4 PATCH (v4.5.4)#143
guycipher merged 12 commits into
masterfrom
454

Conversation

@guycipher
Copy link
Copy Markdown
Member

Clamp the fts query term to FTS_MAX_TERM_BYTES at the prefix and seek buffer boundaries so a 512 character multibyte token that packs more than 514 bytes cannot overrun the stack prefix that fts_build_key already truncates to on the write side.

Check the return of encryption_key_get in both encrypt and decrypt paths and fail closed when the keyring cannot satisfy the request. The local key buffer was otherwise fed uninitialised stack bytes into encryption_crypt, silently mis encrypting or returning empty rows.

Reject ADD FULLTEXT and ADD SPATIAL via ALGORITHM INPLACE in check_if_supported_inplace_alter. The inplace builder skips both key types during the row scan, leaving an empty CF that silently returns no rows for pre existing data. COPY fallback routes every row through write_row and populates the new CF correctly.

Abort the inplace ADD INDEX scan on per row put failure rather than logging once and continuing. The old behaviour shipped a silently incomplete index, matching the failure mode the batch commit guard already refused.

Wrap tidesdb_iter_new in tdb_iter_new_blocking so reader fd starvation backs off and retries via the existing backpressure helper instead of immediately surfacing HA_ERR_LOCK_WAIT_TIMEOUT. Thirteen call sites converted.

Stop pessimistic locking from X locking every row a secondary index scan walks past under UPDATE and DELETE. The lock is now taken on the actual mutation target inside update_row and delete_row. SELECT FOR UPDATE keeps the per row lock per the SQL contract.

Remove the semi consistent read overrides. The engine never implemented the optimistic skip path, so the previous code advertised a capability MariaDB then used to reorder UPDATE and DELETE in ways the engine could not honour.

Delete the trx_registered_ field. It was initialised false and never set true, so the gate it appeared to model encoded nothing.

Release plugin level row locks in maybe_bulk_commit on both the success and failure paths. The failure branch additionally freed the underlying txn while leaving request structs that referenced it on the held list.

Rewrite the deadlock walker as a fixed capacity DFS across every conflicting holder per hop, with a visited set sized to four times the depth cap. The old walker followed only the first conflicting holder and silently missed cycles that ran through later ones. Frontier overflow and depth cap both return true so a false positive triggers a cheap retry rather than a fifty second lock wait timeout.

Make tdb_lock_partition_t alignas sixty four so unrelated partitions never share a cache line. Size the partition count at init from hardware concurrency times eight, clamped between one twenty eight and sixty five thousand five hundred thirty six. On a sixteen thread box this drops from sixty five thousand five hundred thirty six partitions to one twenty eight while eliminating the false sharing that was making the larger array slower than the smaller one.

Anchor the backpressure deadline in external_lock F_WRLCK and consult it from tdb_with_backpressure_wait so a multi call statement cannot multiply the per call timeout by row count. A five thousand row INSERT with N secondary indexes previously had a worst case wall clock of one plus N times five thousand times sixty seconds and ignored MAX_EXECUTION_TIME.

Call tidesdb_cancel_background_work in tidesdb_deinit_func before tidesdb_close so shutdown with a multi gigabyte compaction backlog no longer blocks for minutes.

Route schema cf discover paths through tdb_with_backpressure_wait for transient BUSY and through tdb_rc_to_ha for hard IO. The previous code returned HA_ERR_NO_SUCH_TABLE for any tidesdb_txn_get failure, triggering the delete frm rediscover loop the inline comment warned about.

Return the actual rc from fts_load_meta and refuse the write back in fts_update_meta unless the load returned TDB_SUCCESS or TDB_ERR_NOT_FOUND. A transient read failure no longer clobbers BM25 totals to zero for the life of the index.

Map TDB_ERR_UNKNOWN to HA_ERR_LOCK_DEADLOCK in tdb_rc_to_ha. Unified mode commit returns it when the active memtable try_ref retry budget exhausts under heavy rotation contention, which is the same family as TDB_ERR_CONFLICT from the caller perspective and benefits from MariaDB triggering its retry path rather than surfacing an opaque ER_GET_ERRNO 1030.

Distinguish TDB_ERR_LOCKED from real failures in compact_range. Post DELETE auto compact absorbs it silently because another compaction will subsume the work. OPTIMIZE TABLE returns HA_ADMIN_TRY_ALTER with a NOTE so the operator knows to retry if they need the post optimize state.

Switch the S3 connector from the eight argument positional tidesdb_objstore_s3_create to the modern struct entry tidesdb_objstore_s3_create_config. Surface tls_ca_path, tls_insecure_skip_verify, multipart_threshold and multipart_part_size as sysvars. The insecure skip verify path also logs a warning at start so it cannot be flipped on silently.

Surface ten objstore tunables that were previously hardcoded or left at the library default with no way to override. cache_on_read, cache_on_write, max_concurrent_uploads and downloads, multipart_threshold and part_size, sync_manifest_to_object, wal_upload_sync, replicate_wal and replica_replay_wal. The last two were silently true before.

Replace the inline numeric literals in the per CF default sysvars with TIDESQL_DEFAULT mirror constants in ha_tidesdb.h. The mirror is necessary because the library tidesdb.h leaks a realloc macro that conflicts with MariaDB String realloc. A library default drift now shows up in one place.

Surface unified_memtable_skip_list_max_level and probability as sysvars, both defaulting to zero so the prior library default behaviour is preserved.

Open one fts iterator before the query term loop in ft_init_ext and reuse it across every term. A MATCH AGAINST with N terms now pays one merge heap construction instead of N.

Pre fold per term BM25 constants idf times k1 plus one, k1 times one minus b, and k1 times b times inv_avgdl so the per posting inner loop drops from four multiplies and one divide to one multiply add, one divide and one multiply. Reserve doc_scores and doc_required_hits to suppress hash growth across all the term matches.

Coalesce the always read both sites in the fts posting loops and the inplace ADD INDEX row scan to tidesdb_iter_key_value. Sites that take a key based early break before reading the value are left alone because eager fetching would regress them.

Cache the records_in_range full range normaliser per CF on the same TIDESDB_STATS_REFRESH_US window as scan_time. A plan probing N range alternatives now recomputes the normaliser once rather than N times. Per index storage uses unique_ptr atomic array because std atomic is not move constructible.

Cap spatial decomposition at sixteen times the grid axis and fall back to a single full range entry for wider query boxes. The MBR post filter still rejects non overlapping rows on the read side so correctness is unaffected, but a wide query no longer allocates and sorts sixty five thousand uint64_t. Add a static_assert so a future SPATIAL_DECOMP_BITS bump cannot underflow the shift.

Switch index_last from tidesdb_iter_seek_to_last to seek_for_prev against a 0xFF padded sentinel for both the PK and the secondary branches. seek_to_last walks every source maximum first while the sentinel approach lands in one seek.

Validate the prefix in the index_next PK branch when the prior index_read_map landed on a partial PK exact prefix. Gate via a pk_partial_exact_active_ flag set in index_read_map and cleared on index_end or any other index_read_map. The old code walked off the prefix and returned unrelated rows if a plan called index_next rather than index_next_same after a partial PK exact seek.

Guard move_field_offset by ptrdiff in fts_extract_and_tokenize and all four spatial maintenance sites. The autocommit hot path where buf equals record zero no longer pays a virtual dispatch pair for nothing. make_comparable_key was already guarded.

Use tidesdb_free at every site that releases a buffer from tidesdb_txn_get or tidesdb_list_column_families. Plain free works today because the library wrapper is itself a free, but a future allocator change would otherwise break every site silently.

Replace rec_buff_length with the explicit table record one minus record zero subtraction in recover_counters val_int_offset. rec_buff_length is not contractually equal to the gap.

Delete the stmt_was_dirty field, its three hot path stores and the F_UNLCK reset. It was never read.

Drop two unused inline helpers, tdb_is_blend_char and mbr_contains. The tokenizer reads tdb_blend_char_map directly under the rwlock and spatial CONTAIN routes through mbr_within deliberately.

Extend tidesdb_fulltext with a 1024 byte oversize query term case in BOOLEAN and BOOLEAN wildcard modes, both required to complete without overrun and return no rows. Extend tidesdb_online_ddl with ADD FULLTEXT and ADD SPATIAL sections asserting that ALGORITHM INPLACE is rejected with the expected reason and that the default algorithm fallback back fills pre existing rows so MATCH AGAINST and MBRWithin see them.

Bump version 4.5.3 to 4.5.4 and the matching hex from 0x40503 to 0x40504.

In install.sh, fall back to sudo for the /etc/ld.so.conf.d write and ldconfig invocation in register_mariadb_client_lib. The prior implementation gated on the install prefix needing sudo, which is wrong because /etc always does regardless of where the install lives.

Clamp the fts query term to FTS_MAX_TERM_BYTES at the prefix and seek
buffer boundaries so a 512 character multibyte token that packs more
than 514 bytes cannot overrun the stack prefix that fts_build_key
already truncates to on the write side.

Check the return of encryption_key_get in both encrypt and decrypt
paths and fail closed when the keyring cannot satisfy the request.
The local key buffer was otherwise fed uninitialised stack bytes
into encryption_crypt, silently mis encrypting or returning empty
rows.

Reject ADD FULLTEXT and ADD SPATIAL via ALGORITHM INPLACE in
check_if_supported_inplace_alter. The inplace builder skips both key
types during the row scan, leaving an empty CF that silently returns
no rows for pre existing data. COPY fallback routes every row
through write_row and populates the new CF correctly.

Abort the inplace ADD INDEX scan on per row put failure rather than
logging once and continuing. The old behaviour shipped a silently
incomplete index, matching the failure mode the batch commit guard
already refused.

Wrap tidesdb_iter_new in tdb_iter_new_blocking so reader fd
starvation backs off and retries via the existing backpressure
helper instead of immediately surfacing HA_ERR_LOCK_WAIT_TIMEOUT.
Thirteen call sites converted.

Stop pessimistic locking from X locking every row a secondary index
scan walks past under UPDATE and DELETE. The lock is now taken on
the actual mutation target inside update_row and delete_row. SELECT
FOR UPDATE keeps the per row lock per the SQL contract.

Remove the semi consistent read overrides. The engine never
implemented the optimistic skip path, so the previous code
advertised a capability MariaDB then used to reorder UPDATE and
DELETE in ways the engine could not honour.

Delete the trx_registered_ field. It was initialised false and
never set true, so the gate it appeared to model encoded nothing.

Release plugin level row locks in maybe_bulk_commit on both the
success and failure paths. The failure branch additionally freed
the underlying txn while leaving request structs that referenced
it on the held list.

Rewrite the deadlock walker as a fixed capacity DFS across every
conflicting holder per hop, with a visited set sized to four times
the depth cap. The old walker followed only the first conflicting
holder and silently missed cycles that ran through later ones.
Frontier overflow and depth cap both return true so a false
positive triggers a cheap retry rather than a fifty second lock
wait timeout.

Make tdb_lock_partition_t alignas sixty four so unrelated
partitions never share a cache line. Size the partition count at
init from hardware concurrency times eight, clamped between one
twenty eight and sixty five thousand five hundred thirty six. On a
sixteen thread box this drops from sixty five thousand five hundred
thirty six partitions to one twenty eight while eliminating the
false sharing that was making the larger array slower than the
smaller one.

Anchor the backpressure deadline in external_lock F_WRLCK and
consult it from tdb_with_backpressure_wait so a multi call
statement cannot multiply the per call timeout by row count. A five
thousand row INSERT with N secondary indexes previously had a worst
case wall clock of one plus N times five thousand times sixty
seconds and ignored MAX_EXECUTION_TIME.

Call tidesdb_cancel_background_work in tidesdb_deinit_func before
tidesdb_close so shutdown with a multi gigabyte compaction backlog
no longer blocks for minutes.

Route schema cf discover paths through tdb_with_backpressure_wait
for transient BUSY and through tdb_rc_to_ha for hard IO. The
previous code returned HA_ERR_NO_SUCH_TABLE for any tidesdb_txn_get
failure, triggering the delete frm rediscover loop the inline
comment warned about.

Return the actual rc from fts_load_meta and refuse the write back
in fts_update_meta unless the load returned TDB_SUCCESS or
TDB_ERR_NOT_FOUND. A transient read failure no longer clobbers BM25
totals to zero for the life of the index.

Map TDB_ERR_UNKNOWN to HA_ERR_LOCK_DEADLOCK in tdb_rc_to_ha.
Unified mode commit returns it when the active memtable try_ref
retry budget exhausts under heavy rotation contention, which is the
same family as TDB_ERR_CONFLICT from the caller perspective and
benefits from MariaDB triggering its retry path rather than
surfacing an opaque ER_GET_ERRNO 1030.

Distinguish TDB_ERR_LOCKED from real failures in compact_range.
Post DELETE auto compact absorbs it silently because another
compaction will subsume the work. OPTIMIZE TABLE returns
HA_ADMIN_TRY_ALTER with a NOTE so the operator knows to retry if
they need the post optimize state.

Switch the S3 connector from the eight argument positional
tidesdb_objstore_s3_create to the modern struct entry
tidesdb_objstore_s3_create_config. Surface tls_ca_path,
tls_insecure_skip_verify, multipart_threshold and multipart_part_size
as sysvars. The insecure skip verify path also logs a warning at
start so it cannot be flipped on silently.

Surface ten objstore tunables that were previously hardcoded or
left at the library default with no way to override. cache_on_read,
cache_on_write, max_concurrent_uploads and downloads,
multipart_threshold and part_size, sync_manifest_to_object,
wal_upload_sync, replicate_wal and replica_replay_wal. The last
two were silently true before.

Replace the inline numeric literals in the per CF default sysvars
with TIDESQL_DEFAULT mirror constants in ha_tidesdb.h. The mirror
is necessary because the library tidesdb.h leaks a realloc macro
that conflicts with MariaDB String realloc. A library default
drift now shows up in one place.

Surface unified_memtable_skip_list_max_level and probability as
sysvars, both defaulting to zero so the prior library default
behaviour is preserved.

Open one fts iterator before the query term loop in ft_init_ext
and reuse it across every term. A MATCH AGAINST with N terms now
pays one merge heap construction instead of N.

Pre fold per term BM25 constants idf times k1 plus one, k1 times
one minus b, and k1 times b times inv_avgdl so the per posting
inner loop drops from four multiplies and one divide to one
multiply add, one divide and one multiply. Reserve doc_scores and
doc_required_hits to suppress hash growth across all the term
matches.

Coalesce the always read both sites in the fts posting loops and
the inplace ADD INDEX row scan to tidesdb_iter_key_value. Sites
that take a key based early break before reading the value are
left alone because eager fetching would regress them.

Cache the records_in_range full range normaliser per CF on the
same TIDESDB_STATS_REFRESH_US window as scan_time. A plan probing
N range alternatives now recomputes the normaliser once rather
than N times. Per index storage uses unique_ptr atomic array
because std atomic is not move constructible.

Cap spatial decomposition at sixteen times the grid axis and fall
back to a single full range entry for wider query boxes. The MBR
post filter still rejects non overlapping rows on the read side so
correctness is unaffected, but a wide query no longer allocates
and sorts sixty five thousand uint64_t. Add a static_assert so a
future SPATIAL_DECOMP_BITS bump cannot underflow the shift.

Switch index_last from tidesdb_iter_seek_to_last to seek_for_prev
against a 0xFF padded sentinel for both the PK and the secondary
branches. seek_to_last walks every source maximum first while the
sentinel approach lands in one seek.

Validate the prefix in the index_next PK branch when the prior
index_read_map landed on a partial PK exact prefix. Gate via a
pk_partial_exact_active_ flag set in index_read_map and cleared
on index_end or any other index_read_map. The old code walked off
the prefix and returned unrelated rows if a plan called index_next
rather than index_next_same after a partial PK exact seek.

Guard move_field_offset by ptrdiff in fts_extract_and_tokenize and
all four spatial maintenance sites. The autocommit hot path where
buf equals record zero no longer pays a virtual dispatch pair for
nothing. make_comparable_key was already guarded.

Use tidesdb_free at every site that releases a buffer from
tidesdb_txn_get or tidesdb_list_column_families. Plain free works
today because the library wrapper is itself a free, but a future
allocator change would otherwise break every site silently.

Replace rec_buff_length with the explicit table record one minus
record zero subtraction in recover_counters val_int_offset.
rec_buff_length is not contractually equal to the gap.

Delete the stmt_was_dirty field, its three hot path stores and the
F_UNLCK reset. It was never read.

Drop two unused inline helpers, tdb_is_blend_char and mbr_contains.
The tokenizer reads tdb_blend_char_map directly under the rwlock
and spatial CONTAIN routes through mbr_within deliberately.

Extend tidesdb_fulltext with a 1024 byte oversize query term case
in BOOLEAN and BOOLEAN wildcard modes, both required to complete
without overrun and return no rows. Extend tidesdb_online_ddl with
ADD FULLTEXT and ADD SPATIAL sections asserting that ALGORITHM
INPLACE is rejected with the expected reason and that the default
algorithm fallback back fills pre existing rows so MATCH AGAINST
and MBRWithin see them.

Bump version 4.5.3 to 4.5.4 and the matching hex from 0x40503 to
0x40504.

In install.sh, fall back to sudo for the /etc/ld.so.conf.d write
and ldconfig invocation in register_mariadb_client_lib. The prior
implementation gated on the install prefix needing sudo, which is
wrong because /etc always does regardless of where the install
lives.
@guycipher guycipher self-assigned this May 30, 2026
@guycipher guycipher changed the title stability and lock manager rework against tidesdb 9.3.2, bump 4.5.4 TideSQL 4 PATCH (v4.5.4) May 30, 2026
@guycipher
Copy link
Copy Markdown
Member Author

Somethin seems up with library for the k8s tests reviewing the logs; I'm writing a probe locally with rust-fs to see if its something in v9.3.2 comparing to v9.3.1, something is up there, could be configuration related.

guycipher added 9 commits May 31, 2026 08:52
While in the headers add a short note above the handler.h block in
ha_tidesdb.h explaining the include order requirement that surfaces
on MariaDB 11.4 and newer. handler.h pulls in server headers that
depend on macros and typedefs defined in my_global.h, and the order
matters once 11.4 tightens its include hygiene. Future readers
chasing a missing declaration error on the next bump will know where
to look.

Also fix a spurious warning in the write_pressure MTR test. With
tidesdb_unified_memtable on the default sync mode FULL the engine
fires a one shot informational warning when a table is created with
SYNC_MODE NONE, since the shared WAL is what governs durability and
the table option is ignored. The test creates two such tables so
the warning shows up twice in the recorded output. Pass
tidesdb_unified_memtable_sync_mode NONE in the test opt file so the
warning never fires, and drop the now stale lines from the result
file.
@guycipher guycipher merged commit a694654 into master Jun 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant