[DO NOT MERGE, ONLY FOR REVIEW] Feature/first value implementation#2
[DO NOT MERGE, ONLY FOR REVIEW] Feature/first value implementation#2rileydes-improving wants to merge 20 commits intofeature/vector-reducer-functionsfrom
Conversation
|
I think writing compatibility tests will be most helpful to get the desired behaviour. Write them in |
Sounds good, I'll do that |
Resolves: valkey-io#703 Encountered the following series of crashes during engine startup. 1. SIGSEGV in `string2ll_resolver` — IFUNC resolver called before TSAN runtime initializes ``` Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6de0fb0 in ?? () #0 0x00007ffff6de0fb0 in ?? () #1 0x00000000004b89a7 in string2ll_resolver.lto_priv () #2 0x00007ffff7de4d94 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7ddceaa in dl_main () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df0b5f in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7ddad2e in _dl_start () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff7dd9ef8 in _start () from /lib64/ld-linux-x86-64.so.2 valkey-io#7 0x0000000000000003 in ?? () valkey-io#8 0x00007fffffffe04c in ?? () valkey-io#9 0x00007fffffffe0b6 in ?? () valkey-io#10 0x00007fffffffe0bd in ?? () valkey-io#11 0x0000000000000000 in ?? () ``` 2. TSAN abort on `RTLD_DEEPBIND` — module loader uses `RTLD_DEEPBIND` which is incompatible with TSAN ``` cat /workplace/baswanth/tsan/valkey-search/testing/integration/.build-release-tsan/output/6379_stdout.txt 2>/dev/null | tail -20 45443:M 25 Feb 2026 01:57:58.222 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 45443:M 25 Feb 2026 01:57:58.222 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 45443:M 25 Feb 2026 01:57:58.222 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=45443, just started 45443:M 25 Feb 2026 01:57:58.222 * Configuration loaded 45443:M 25 Feb 2026 01:57:58.224 * monotonic clock: POSIX clock_gettime 45443:M 25 Feb 2026 01:57:58.226 * Running mode=cluster, port=6379. 45443:M 25 Feb 2026 01:57:58.226 * No cluster configuration found, I'm a13c403e8a798f9ad051c654ff540ce1b3833b9d ==45443==You are trying to dlopen a /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so shared library with RTLD_DEEPBIND flag which is incompatible with sanitizer runtime (see google/sanitizers#611 for details). If you want to run /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags. ``` 3. `CHECK failed: sanitizer_deadlock_detector.h:67` — TSAN deadlock detector overflows during module load ``` 7826:M 25 Feb 2026 02:39:45.284 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=17826, just started 17826:M 25 Feb 2026 02:39:45.284 * Configuration loaded 17826:M 25 Feb 2026 02:39:45.286 * monotonic clock: POSIX clock_gettime 17826:M 25 Feb 2026 02:39:45.288 * Running mode=cluster, port=6379. 17826:M 25 Feb 2026 02:39:45.289 * No cluster configuration found, I'm 134edf06efe1a2e084ce9c94a43dab67e8376dfc 17826:M 25 Feb 2026 02:39:45.298 . <module> Setting shards to 524287 ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=17826) ``` core changes that needs to be added. ``` cd /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-server && git diff src/util.c src/module.c src/config.h diff --git a/src/config.h b/src/config.h index 0c6b111a9..f80fa8cbc 100644 --- a/src/config.h +++ b/src/config.h @@ -172,6 +172,10 @@ #define VALKEY_NO_SANITIZE(sanitizer) #endif +#if defined(__SANITIZE_THREAD__) +#define VALKEY_THREAD_SANITIZER 1 +#endif + #if defined(__SANITIZE_ADDRESS__) /* GCC */ #define VALKEY_ADDRESS_SANITIZER 1 diff --git a/src/module.c b/src/module.c index 8d1bad4fe..0435499a8 100644 --- a/src/module.c +++ b/src/module.c @@ -12545,7 +12545,7 @@ int moduleLoad(const char *path, void **module_argv, int module_argc, int is_loa } int dlopen_flags = RTLD_NOW | RTLD_LOCAL; -#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && __has_include(<dlfcn.h>) +#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && !defined(VALKEY_THREAD_SANITIZER) && __has_include(<dlfcn.h>) /* RTLD_DEEPBIND, which is required for loading modules that contains the * same symbols, does not work with ASAN. Therefore, we exclude * RTLD_DEEPBIND when doing test builds with ASAN. diff --git a/src/util.c b/src/util.c index 0631736c8..cf5a073b1 100644 --- a/src/util.c +++ b/src/util.c @@ -622,7 +622,7 @@ static int string2llScalar(const char *s, size_t slen, long long *value) { } #if HAVE_IFUNC && HAVE_X86_SIMD -__attribute__((no_sanitize_address, used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { +__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { /* Ifunc resolvers run before ASan initialization and before CPU detection * is initialized, so disable ASan and init CPU detection here. */ __builtin_cpu_init(); ``` --------- Signed-off-by: Baswanth Vegunta <baswanth@amazon.com> Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
Resolves: valkey-io#703 Encountered the following series of crashes during engine startup. 1. SIGSEGV in `string2ll_resolver` — IFUNC resolver called before TSAN runtime initializes ``` Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6de0fb0 in ?? () #0 0x00007ffff6de0fb0 in ?? () #1 0x00000000004b89a7 in string2ll_resolver.lto_priv () #2 0x00007ffff7de4d94 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7ddceaa in dl_main () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df0b5f in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7ddad2e in _dl_start () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff7dd9ef8 in _start () from /lib64/ld-linux-x86-64.so.2 valkey-io#7 0x0000000000000003 in ?? () valkey-io#8 0x00007fffffffe04c in ?? () valkey-io#9 0x00007fffffffe0b6 in ?? () valkey-io#10 0x00007fffffffe0bd in ?? () valkey-io#11 0x0000000000000000 in ?? () ``` 2. TSAN abort on `RTLD_DEEPBIND` — module loader uses `RTLD_DEEPBIND` which is incompatible with TSAN ``` cat /workplace/baswanth/tsan/valkey-search/testing/integration/.build-release-tsan/output/6379_stdout.txt 2>/dev/null | tail -20 45443:M 25 Feb 2026 01:57:58.222 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 45443:M 25 Feb 2026 01:57:58.222 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 45443:M 25 Feb 2026 01:57:58.222 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=45443, just started 45443:M 25 Feb 2026 01:57:58.222 * Configuration loaded 45443:M 25 Feb 2026 01:57:58.224 * monotonic clock: POSIX clock_gettime 45443:M 25 Feb 2026 01:57:58.226 * Running mode=cluster, port=6379. 45443:M 25 Feb 2026 01:57:58.226 * No cluster configuration found, I'm a13c403e8a798f9ad051c654ff540ce1b3833b9d ==45443==You are trying to dlopen a /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so shared library with RTLD_DEEPBIND flag which is incompatible with sanitizer runtime (see google/sanitizers#611 for details). If you want to run /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-json/build/src/libjson.so library under sanitizers please remove RTLD_DEEPBIND from dlopen flags. ``` 3. `CHECK failed: sanitizer_deadlock_detector.h:67` — TSAN deadlock detector overflows during module load ``` 7826:M 25 Feb 2026 02:39:45.284 * Valkey version=9.0.1, bits=64, commit=ab3c953b, modified=1, pid=17826, just started 17826:M 25 Feb 2026 02:39:45.284 * Configuration loaded 17826:M 25 Feb 2026 02:39:45.286 * monotonic clock: POSIX clock_gettime 17826:M 25 Feb 2026 02:39:45.288 * Running mode=cluster, port=6379. 17826:M 25 Feb 2026 02:39:45.289 * No cluster configuration found, I'm 134edf06efe1a2e084ce9c94a43dab67e8376dfc 17826:M 25 Feb 2026 02:39:45.298 . <module> Setting shards to 524287 ThreadSanitizer: CHECK failed: sanitizer_deadlock_detector.h:67 "((n_all_locks_)) < (((sizeof(all_locks_with_contexts_)/sizeof((all_locks_with_contexts_)[0]))))" (0x40, 0x40) (tid=17826) ``` core changes that needs to be added. ``` cd /workplace/baswanth/tsan/valkey-search/.build-release-tsan/valkey-server && git diff src/util.c src/module.c src/config.h diff --git a/src/config.h b/src/config.h index 0c6b111a9..f80fa8cbc 100644 --- a/src/config.h +++ b/src/config.h @@ -172,6 +172,10 @@ #define VALKEY_NO_SANITIZE(sanitizer) #endif +#if defined(__SANITIZE_THREAD__) +#define VALKEY_THREAD_SANITIZER 1 +#endif + #if defined(__SANITIZE_ADDRESS__) /* GCC */ #define VALKEY_ADDRESS_SANITIZER 1 diff --git a/src/module.c b/src/module.c index 8d1bad4fe..0435499a8 100644 --- a/src/module.c +++ b/src/module.c @@ -12545,7 +12545,7 @@ int moduleLoad(const char *path, void **module_argv, int module_argc, int is_loa } int dlopen_flags = RTLD_NOW | RTLD_LOCAL; -#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && __has_include(<dlfcn.h>) +#if (defined(__GLIBC__) || defined(__FreeBSD__)) && !defined(VALKEY_ADDRESS_SANITIZER) && !defined(VALKEY_THREAD_SANITIZER) && __has_include(<dlfcn.h>) /* RTLD_DEEPBIND, which is required for loading modules that contains the * same symbols, does not work with ASAN. Therefore, we exclude * RTLD_DEEPBIND when doing test builds with ASAN. diff --git a/src/util.c b/src/util.c index 0631736c8..cf5a073b1 100644 --- a/src/util.c +++ b/src/util.c @@ -622,7 +622,7 @@ static int string2llScalar(const char *s, size_t slen, long long *value) { } #if HAVE_IFUNC && HAVE_X86_SIMD -__attribute__((no_sanitize_address, used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { +__attribute__((no_sanitize_address, no_sanitize("thread"), used)) static int (*string2ll_resolver(void))(const char *, size_t, long long *) { /* Ifunc resolvers run before ASan initialization and before CPU detection * is initialized, so disable ASan and init CPU detection here. */ __builtin_cpu_init(); ``` --------- Signed-off-by: Baswanth Vegunta <baswanth@amazon.com> Co-authored-by: Baswanth Vegunta <baswanth@amazon.com>
74eb443 to
ed98d17
Compare
2819935 to
0117d89
Compare
Changed the ProcessRecords interface to accept all records at once instead of processing them one at a time. This enables more efficient implementations, particularly for COUNT which now simply returns the input size rather than incrementing a counter for each record. Changes: - Updated ReducerInstance::ProcessRecords to accept const std::vector<absl::InlinedVector<expr::Value, 4>>& all_values - Modified GroupBy::Execute to collect all values per group before calling ProcessRecords once - Updated all reducer implementations (Count, Min, Max, Sum, Avg, Stddev, CountDistinct) to process batched records Benefits: - Count reducer is now O(1) instead of O(n) function calls - Better separation between data collection and processing phases - Enables future optimizations for batch-aware reducers Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Alexandru Filip <alexandru.filip@improving.com>
Moving serialization of position map outside of the text index lock context --------- Signed-off-by: Aksha Thakkar <thaakb@amazon.com> Co-authored-by: Aksha Thakkar <thaakb@amazon.com>
Signed-off-by: Brennan Cathcart <brennancathcart@gmail.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
Signed-off-by: Allen Samuels <allenss@amazon.com>
- Add FIRST_VALUE reducer with three argument variants: simple mode (1 arg), BY clause with default ASC order (3 args), and BY clause with explicit ASC/DESC direction (4 args) - Implement parser logic in ft_aggregate_parser.cc to handle reducer special parsing requirements - Add parser header definitions for FIRST_VALUE token and parsing functions - Adding FIRST_VALUE reducer in ft_aggregate_exec.cc - Update ft.aggregate.json command schema with FIRST_VALUE reducer documentation - Add compatibility tests - Update user-facing documentation with FIRST_VALUE syntax and behavior descriptions Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
Signed-off-by: Riley Des <riley.desserre@improving.com>
4eb8fd4 to
10bc54f
Compare
Signed-off-by: Riley Des <riley.desserre@improving.com>
…ey-io#904) Signed-off-by: Elias Tamraz <tamrazelias@gmail.com>
Removing some scenarios due to benchamrk workflow taking too long. Removed 12b , 12c, 12e -> PositionMap extreme test already covered in 12a. 12b, 12c, 12e test the PositionMap differently but not needed now. Removed 14a -> failing on CME Removed 14c, 14e -> Removing Baseline case with SUFFIXTRIE enabled and STOPWORDS 0 since same performance as the baseline case, so not adding any value. Changed 13b to have 500 clients from 1000. --------- Signed-off-by: Aksha Thakkar <thaakb@amazon.com> Signed-off-by: AkshaThakkar1812 <akshathakkar@gmail.com> Co-authored-by: Aksha Thakkar <thaakb@amazon.com>
…y-io#896) kDefaultMinPrefixLength is annotated as “configurable”, but it is currently hardcoded and cannot be changed at runtime. This PR adds a module config (search.tag-min-prefix-length) to control the minimum prefix length for TAG wildcard queries (e.g., @field:{prefix*}). issue : valkey-io#895 Signed-off-by: Su Ko <rhtn1128@gmail.com>
An implementation of the First Value Reducer towards adding full FT.Aggregate support