Conversation
Clamp the fts query term to FTS_MAX_TERM_BYTES at the prefix and seek buffer boundaries so a 512 character multibyte token that packs more than 514 bytes cannot overrun the stack prefix that fts_build_key already truncates to on the write side. Check the return of encryption_key_get in both encrypt and decrypt paths and fail closed when the keyring cannot satisfy the request. The local key buffer was otherwise fed uninitialised stack bytes into encryption_crypt, silently mis encrypting or returning empty rows. Reject ADD FULLTEXT and ADD SPATIAL via ALGORITHM INPLACE in check_if_supported_inplace_alter. The inplace builder skips both key types during the row scan, leaving an empty CF that silently returns no rows for pre existing data. COPY fallback routes every row through write_row and populates the new CF correctly. Abort the inplace ADD INDEX scan on per row put failure rather than logging once and continuing. The old behaviour shipped a silently incomplete index, matching the failure mode the batch commit guard already refused. Wrap tidesdb_iter_new in tdb_iter_new_blocking so reader fd starvation backs off and retries via the existing backpressure helper instead of immediately surfacing HA_ERR_LOCK_WAIT_TIMEOUT. Thirteen call sites converted. Stop pessimistic locking from X locking every row a secondary index scan walks past under UPDATE and DELETE. The lock is now taken on the actual mutation target inside update_row and delete_row. SELECT FOR UPDATE keeps the per row lock per the SQL contract. Remove the semi consistent read overrides. The engine never implemented the optimistic skip path, so the previous code advertised a capability MariaDB then used to reorder UPDATE and DELETE in ways the engine could not honour. Delete the trx_registered_ field. It was initialised false and never set true, so the gate it appeared to model encoded nothing. Release plugin level row locks in maybe_bulk_commit on both the success and failure paths. The failure branch additionally freed the underlying txn while leaving request structs that referenced it on the held list. Rewrite the deadlock walker as a fixed capacity DFS across every conflicting holder per hop, with a visited set sized to four times the depth cap. The old walker followed only the first conflicting holder and silently missed cycles that ran through later ones. Frontier overflow and depth cap both return true so a false positive triggers a cheap retry rather than a fifty second lock wait timeout. Make tdb_lock_partition_t alignas sixty four so unrelated partitions never share a cache line. Size the partition count at init from hardware concurrency times eight, clamped between one twenty eight and sixty five thousand five hundred thirty six. On a sixteen thread box this drops from sixty five thousand five hundred thirty six partitions to one twenty eight while eliminating the false sharing that was making the larger array slower than the smaller one. Anchor the backpressure deadline in external_lock F_WRLCK and consult it from tdb_with_backpressure_wait so a multi call statement cannot multiply the per call timeout by row count. A five thousand row INSERT with N secondary indexes previously had a worst case wall clock of one plus N times five thousand times sixty seconds and ignored MAX_EXECUTION_TIME. Call tidesdb_cancel_background_work in tidesdb_deinit_func before tidesdb_close so shutdown with a multi gigabyte compaction backlog no longer blocks for minutes. Route schema cf discover paths through tdb_with_backpressure_wait for transient BUSY and through tdb_rc_to_ha for hard IO. The previous code returned HA_ERR_NO_SUCH_TABLE for any tidesdb_txn_get failure, triggering the delete frm rediscover loop the inline comment warned about. Return the actual rc from fts_load_meta and refuse the write back in fts_update_meta unless the load returned TDB_SUCCESS or TDB_ERR_NOT_FOUND. A transient read failure no longer clobbers BM25 totals to zero for the life of the index. Map TDB_ERR_UNKNOWN to HA_ERR_LOCK_DEADLOCK in tdb_rc_to_ha. Unified mode commit returns it when the active memtable try_ref retry budget exhausts under heavy rotation contention, which is the same family as TDB_ERR_CONFLICT from the caller perspective and benefits from MariaDB triggering its retry path rather than surfacing an opaque ER_GET_ERRNO 1030. Distinguish TDB_ERR_LOCKED from real failures in compact_range. Post DELETE auto compact absorbs it silently because another compaction will subsume the work. OPTIMIZE TABLE returns HA_ADMIN_TRY_ALTER with a NOTE so the operator knows to retry if they need the post optimize state. Switch the S3 connector from the eight argument positional tidesdb_objstore_s3_create to the modern struct entry tidesdb_objstore_s3_create_config. Surface tls_ca_path, tls_insecure_skip_verify, multipart_threshold and multipart_part_size as sysvars. The insecure skip verify path also logs a warning at start so it cannot be flipped on silently. Surface ten objstore tunables that were previously hardcoded or left at the library default with no way to override. cache_on_read, cache_on_write, max_concurrent_uploads and downloads, multipart_threshold and part_size, sync_manifest_to_object, wal_upload_sync, replicate_wal and replica_replay_wal. The last two were silently true before. Replace the inline numeric literals in the per CF default sysvars with TIDESQL_DEFAULT mirror constants in ha_tidesdb.h. The mirror is necessary because the library tidesdb.h leaks a realloc macro that conflicts with MariaDB String realloc. A library default drift now shows up in one place. Surface unified_memtable_skip_list_max_level and probability as sysvars, both defaulting to zero so the prior library default behaviour is preserved. Open one fts iterator before the query term loop in ft_init_ext and reuse it across every term. A MATCH AGAINST with N terms now pays one merge heap construction instead of N. Pre fold per term BM25 constants idf times k1 plus one, k1 times one minus b, and k1 times b times inv_avgdl so the per posting inner loop drops from four multiplies and one divide to one multiply add, one divide and one multiply. Reserve doc_scores and doc_required_hits to suppress hash growth across all the term matches. Coalesce the always read both sites in the fts posting loops and the inplace ADD INDEX row scan to tidesdb_iter_key_value. Sites that take a key based early break before reading the value are left alone because eager fetching would regress them. Cache the records_in_range full range normaliser per CF on the same TIDESDB_STATS_REFRESH_US window as scan_time. A plan probing N range alternatives now recomputes the normaliser once rather than N times. Per index storage uses unique_ptr atomic array because std atomic is not move constructible. Cap spatial decomposition at sixteen times the grid axis and fall back to a single full range entry for wider query boxes. The MBR post filter still rejects non overlapping rows on the read side so correctness is unaffected, but a wide query no longer allocates and sorts sixty five thousand uint64_t. Add a static_assert so a future SPATIAL_DECOMP_BITS bump cannot underflow the shift. Switch index_last from tidesdb_iter_seek_to_last to seek_for_prev against a 0xFF padded sentinel for both the PK and the secondary branches. seek_to_last walks every source maximum first while the sentinel approach lands in one seek. Validate the prefix in the index_next PK branch when the prior index_read_map landed on a partial PK exact prefix. Gate via a pk_partial_exact_active_ flag set in index_read_map and cleared on index_end or any other index_read_map. The old code walked off the prefix and returned unrelated rows if a plan called index_next rather than index_next_same after a partial PK exact seek. Guard move_field_offset by ptrdiff in fts_extract_and_tokenize and all four spatial maintenance sites. The autocommit hot path where buf equals record zero no longer pays a virtual dispatch pair for nothing. make_comparable_key was already guarded. Use tidesdb_free at every site that releases a buffer from tidesdb_txn_get or tidesdb_list_column_families. Plain free works today because the library wrapper is itself a free, but a future allocator change would otherwise break every site silently. Replace rec_buff_length with the explicit table record one minus record zero subtraction in recover_counters val_int_offset. rec_buff_length is not contractually equal to the gap. Delete the stmt_was_dirty field, its three hot path stores and the F_UNLCK reset. It was never read. Drop two unused inline helpers, tdb_is_blend_char and mbr_contains. The tokenizer reads tdb_blend_char_map directly under the rwlock and spatial CONTAIN routes through mbr_within deliberately. Extend tidesdb_fulltext with a 1024 byte oversize query term case in BOOLEAN and BOOLEAN wildcard modes, both required to complete without overrun and return no rows. Extend tidesdb_online_ddl with ADD FULLTEXT and ADD SPATIAL sections asserting that ALGORITHM INPLACE is rejected with the expected reason and that the default algorithm fallback back fills pre existing rows so MATCH AGAINST and MBRWithin see them. Bump version 4.5.3 to 4.5.4 and the matching hex from 0x40503 to 0x40504. In install.sh, fall back to sudo for the /etc/ld.so.conf.d write and ldconfig invocation in register_mariadb_client_lib. The prior implementation gated on the install prefix needing sudo, which is wrong because /etc always does regardless of where the install lives.
Member
Author
|
Somethin seems up with library for the k8s tests reviewing the logs; I'm writing a probe locally with rust-fs to see if its something in v9.3.2 comparing to v9.3.1, something is up there, could be configuration related. |
While in the headers add a short note above the handler.h block in ha_tidesdb.h explaining the include order requirement that surfaces on MariaDB 11.4 and newer. handler.h pulls in server headers that depend on macros and typedefs defined in my_global.h, and the order matters once 11.4 tightens its include hygiene. Future readers chasing a missing declaration error on the next bump will know where to look. Also fix a spurious warning in the write_pressure MTR test. With tidesdb_unified_memtable on the default sync mode FULL the engine fires a one shot informational warning when a table is created with SYNC_MODE NONE, since the shared WAL is what governs durability and the table option is ignored. The test creates two such tables so the warning shows up twice in the recorded output. Pass tidesdb_unified_memtable_sync_mode NONE in the test opt file so the warning never fires, and drop the now stale lines from the result file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Clamp the fts query term to FTS_MAX_TERM_BYTES at the prefix and seek buffer boundaries so a 512 character multibyte token that packs more than 514 bytes cannot overrun the stack prefix that fts_build_key already truncates to on the write side.
Check the return of encryption_key_get in both encrypt and decrypt paths and fail closed when the keyring cannot satisfy the request. The local key buffer was otherwise fed uninitialised stack bytes into encryption_crypt, silently mis encrypting or returning empty rows.
Reject ADD FULLTEXT and ADD SPATIAL via ALGORITHM INPLACE in check_if_supported_inplace_alter. The inplace builder skips both key types during the row scan, leaving an empty CF that silently returns no rows for pre existing data. COPY fallback routes every row through write_row and populates the new CF correctly.
Abort the inplace ADD INDEX scan on per row put failure rather than logging once and continuing. The old behaviour shipped a silently incomplete index, matching the failure mode the batch commit guard already refused.
Wrap tidesdb_iter_new in tdb_iter_new_blocking so reader fd starvation backs off and retries via the existing backpressure helper instead of immediately surfacing HA_ERR_LOCK_WAIT_TIMEOUT. Thirteen call sites converted.
Stop pessimistic locking from X locking every row a secondary index scan walks past under UPDATE and DELETE. The lock is now taken on the actual mutation target inside update_row and delete_row. SELECT FOR UPDATE keeps the per row lock per the SQL contract.
Remove the semi consistent read overrides. The engine never implemented the optimistic skip path, so the previous code advertised a capability MariaDB then used to reorder UPDATE and DELETE in ways the engine could not honour.
Delete the trx_registered_ field. It was initialised false and never set true, so the gate it appeared to model encoded nothing.
Release plugin level row locks in maybe_bulk_commit on both the success and failure paths. The failure branch additionally freed the underlying txn while leaving request structs that referenced it on the held list.
Rewrite the deadlock walker as a fixed capacity DFS across every conflicting holder per hop, with a visited set sized to four times the depth cap. The old walker followed only the first conflicting holder and silently missed cycles that ran through later ones. Frontier overflow and depth cap both return true so a false positive triggers a cheap retry rather than a fifty second lock wait timeout.
Make tdb_lock_partition_t alignas sixty four so unrelated partitions never share a cache line. Size the partition count at init from hardware concurrency times eight, clamped between one twenty eight and sixty five thousand five hundred thirty six. On a sixteen thread box this drops from sixty five thousand five hundred thirty six partitions to one twenty eight while eliminating the false sharing that was making the larger array slower than the smaller one.
Anchor the backpressure deadline in external_lock F_WRLCK and consult it from tdb_with_backpressure_wait so a multi call statement cannot multiply the per call timeout by row count. A five thousand row INSERT with N secondary indexes previously had a worst case wall clock of one plus N times five thousand times sixty seconds and ignored MAX_EXECUTION_TIME.
Call tidesdb_cancel_background_work in tidesdb_deinit_func before tidesdb_close so shutdown with a multi gigabyte compaction backlog no longer blocks for minutes.
Route schema cf discover paths through tdb_with_backpressure_wait for transient BUSY and through tdb_rc_to_ha for hard IO. The previous code returned HA_ERR_NO_SUCH_TABLE for any tidesdb_txn_get failure, triggering the delete frm rediscover loop the inline comment warned about.
Return the actual rc from fts_load_meta and refuse the write back in fts_update_meta unless the load returned TDB_SUCCESS or TDB_ERR_NOT_FOUND. A transient read failure no longer clobbers BM25 totals to zero for the life of the index.
Map TDB_ERR_UNKNOWN to HA_ERR_LOCK_DEADLOCK in tdb_rc_to_ha. Unified mode commit returns it when the active memtable try_ref retry budget exhausts under heavy rotation contention, which is the same family as TDB_ERR_CONFLICT from the caller perspective and benefits from MariaDB triggering its retry path rather than surfacing an opaque ER_GET_ERRNO 1030.
Distinguish TDB_ERR_LOCKED from real failures in compact_range. Post DELETE auto compact absorbs it silently because another compaction will subsume the work. OPTIMIZE TABLE returns HA_ADMIN_TRY_ALTER with a NOTE so the operator knows to retry if they need the post optimize state.
Switch the S3 connector from the eight argument positional tidesdb_objstore_s3_create to the modern struct entry tidesdb_objstore_s3_create_config. Surface tls_ca_path, tls_insecure_skip_verify, multipart_threshold and multipart_part_size as sysvars. The insecure skip verify path also logs a warning at start so it cannot be flipped on silently.
Surface ten objstore tunables that were previously hardcoded or left at the library default with no way to override. cache_on_read, cache_on_write, max_concurrent_uploads and downloads, multipart_threshold and part_size, sync_manifest_to_object, wal_upload_sync, replicate_wal and replica_replay_wal. The last two were silently true before.
Replace the inline numeric literals in the per CF default sysvars with TIDESQL_DEFAULT mirror constants in ha_tidesdb.h. The mirror is necessary because the library tidesdb.h leaks a realloc macro that conflicts with MariaDB String realloc. A library default drift now shows up in one place.
Surface unified_memtable_skip_list_max_level and probability as sysvars, both defaulting to zero so the prior library default behaviour is preserved.
Open one fts iterator before the query term loop in ft_init_ext and reuse it across every term. A MATCH AGAINST with N terms now pays one merge heap construction instead of N.
Pre fold per term BM25 constants idf times k1 plus one, k1 times one minus b, and k1 times b times inv_avgdl so the per posting inner loop drops from four multiplies and one divide to one multiply add, one divide and one multiply. Reserve doc_scores and doc_required_hits to suppress hash growth across all the term matches.
Coalesce the always read both sites in the fts posting loops and the inplace ADD INDEX row scan to tidesdb_iter_key_value. Sites that take a key based early break before reading the value are left alone because eager fetching would regress them.
Cache the records_in_range full range normaliser per CF on the same TIDESDB_STATS_REFRESH_US window as scan_time. A plan probing N range alternatives now recomputes the normaliser once rather than N times. Per index storage uses unique_ptr atomic array because std atomic is not move constructible.
Cap spatial decomposition at sixteen times the grid axis and fall back to a single full range entry for wider query boxes. The MBR post filter still rejects non overlapping rows on the read side so correctness is unaffected, but a wide query no longer allocates and sorts sixty five thousand uint64_t. Add a static_assert so a future SPATIAL_DECOMP_BITS bump cannot underflow the shift.
Switch index_last from tidesdb_iter_seek_to_last to seek_for_prev against a 0xFF padded sentinel for both the PK and the secondary branches. seek_to_last walks every source maximum first while the sentinel approach lands in one seek.
Validate the prefix in the index_next PK branch when the prior index_read_map landed on a partial PK exact prefix. Gate via a pk_partial_exact_active_ flag set in index_read_map and cleared on index_end or any other index_read_map. The old code walked off the prefix and returned unrelated rows if a plan called index_next rather than index_next_same after a partial PK exact seek.
Guard move_field_offset by ptrdiff in fts_extract_and_tokenize and all four spatial maintenance sites. The autocommit hot path where buf equals record zero no longer pays a virtual dispatch pair for nothing. make_comparable_key was already guarded.
Use tidesdb_free at every site that releases a buffer from tidesdb_txn_get or tidesdb_list_column_families. Plain free works today because the library wrapper is itself a free, but a future allocator change would otherwise break every site silently.
Replace rec_buff_length with the explicit table record one minus record zero subtraction in recover_counters val_int_offset. rec_buff_length is not contractually equal to the gap.
Delete the stmt_was_dirty field, its three hot path stores and the F_UNLCK reset. It was never read.
Drop two unused inline helpers, tdb_is_blend_char and mbr_contains. The tokenizer reads tdb_blend_char_map directly under the rwlock and spatial CONTAIN routes through mbr_within deliberately.
Extend tidesdb_fulltext with a 1024 byte oversize query term case in BOOLEAN and BOOLEAN wildcard modes, both required to complete without overrun and return no rows. Extend tidesdb_online_ddl with ADD FULLTEXT and ADD SPATIAL sections asserting that ALGORITHM INPLACE is rejected with the expected reason and that the default algorithm fallback back fills pre existing rows so MATCH AGAINST and MBRWithin see them.
Bump version 4.5.3 to 4.5.4 and the matching hex from 0x40503 to 0x40504.
In install.sh, fall back to sudo for the /etc/ld.so.conf.d write and ldconfig invocation in register_mariadb_client_lib. The prior implementation gated on the install prefix needing sudo, which is wrong because /etc always does regardless of where the install lives.