BE crashes with SIGSEGV (null pointer dereference at 0x0) when querying Parquet-based external tables (Paimon/Hive/Iceberg) with nested type columns (Struct/Array/Map), if a predicate filters out all rows in a RowGroup.
*** SIGSEGV address not mapped to object (@0x0) received by PID 72584 (TID 88200 OR 0x7f319ec10700) from PID 0; stack trace: ***
0# 0x000055EC0721DC35 in /doris/be/lib/doris_be
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/local/jdk-17.0.10/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/local/jdk-17.0.10/lib/server/libjvm.so
3# 0x00007F783FC78630 in /lib64/libpthread.so.0
4# doris::ScalarColumnReader<false, true>::gen_filter_map(doris::FilterMap&, unsigned long, unsigned long, unsigned long, std::vector<unsigned char, std::allocator<unsigned char> >&, std::unique_ptr<doris::FilterMap, std::default_delete<doris::FilterMap> >*) in /doris/be/lib/doris_be
5# doris::ScalarColumnReader<false, true>::_read_nested_column(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool)::{lambda(unsigned long, unsigned long)#1}::operator()(unsigned long, unsigned long) const in /doris/be/lib/doris_be
6# doris::ScalarColumnReader<false, true>::_read_nested_column(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool) in /doris/be/lib/doris_be
7# doris::ScalarColumnReader<false, true>::read_column_data(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, std::shared_ptr<doris::TableSchemaChangeHelper::Node> const&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool, long) in /doris/be/lib/doris_be
8# doris::StructColumnReader::read_column_data(doris::COW<doris::IColumn>::immutable_ptr<doris::IColumn>&, std::shared_ptr<doris::IDataType const>&, std::shared_ptr<doris::TableSchemaChangeHelper::Node> const&, doris::FilterMap&, unsigned long, unsigned long*, bool*, bool, long) in /doris/be/lib/doris_be
9# doris::RowGroupReader::_read_column_data(doris::Block*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, unsigned long, unsigned long*, bool*, doris::FilterMap&) in /doris/be/lib/doris_be
10# doris::RowGroupReader::_do_lazy_read(doris::Block*, unsigned long, unsigned long*, bool*) in /doris/be/lib/doris_be
11# doris::RowGroupReader::next_batch(doris::Block*, unsigned long, unsigned long*, bool*) in /doris/be/lib/doris_be
12# doris::ParquetReader::get_next_block(doris::Block*, unsigned long*, bool*) in /doris/be/lib/doris_be
13# doris::PaimonReader::get_next_block_inner(doris::Block*, unsigned long*, bool*) in /doris/be/lib/doris_be
14# doris::TableFormatReader::get_next_block(doris::Block*, unsigned long*, bool*) in /doris/be/lib/doris_be
15# doris::FileScanner::_get_block_wrapped(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be
16# doris::FileScanner::_get_block_impl(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be
17# doris::Scanner::get_block(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be
18# doris::Scanner::get_block_after_projects(doris::RuntimeState*, doris::Block*, bool*) in /doris/be/lib/doris_be
19# doris::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::ScannerContext>, std::shared_ptr<doris::ScanTask>) in /doris/be/lib/doris_be
20# 0x000055EC0CCBBB55 in /doris/be/lib/doris_be
21# doris::ScannerSplitRunner::process_for(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) in /doris/be/lib/doris_be
22# doris::PrioritizedSplitRunner::process() in /doris/be/lib/doris_be
23# doris::TimeSharingTaskExecutor::_dispatch_thread() in /doris/be/lib/doris_be
24# doris::Thread::supervise_thread(void*) in /doris/be/lib/doris_be
25# start_thread in /lib64/libpthread.so.0
26# __clone in /lib64/libc.so.6
### What You Expected?
The query should return results without crashing. When `filter_all=true`, nested columns should be correctly skipped without dereferencing `nullptr`.
### How to Reproduce?
The following standalone program reproduces the core logic of `gen_filter_map`. Commenting out the `if (filter_all)` guard and running the `else` branch causes SIGSEGV:
```cpp
#include <cassert>
#include <cstdint>
#include <cstdio>
#include <vector>
int main() {
// Simulate filter_all=true: filter_map_data is nullptr
const uint8_t* filter_map_data = nullptr;
bool has_filter = true;
bool filter_all = true;
// rep_levels for a nested column: 3 rows with varying element counts
std::vector<uint16_t> rep_levels = {0, 1, 1, 0, 1, 0};
std::vector<uint8_t> nested_filter_map_data;
if (has_filter) {
if (filter_all) {
// FIX: skip gen_filter_map, produce all-zero nested filter
nested_filter_map_data.assign(rep_levels.size(), 0);
printf("PASS: filter_all path correctly produces all-zero nested filter\n");
} else {
// BUG: dereferences nullptr → SIGSEGV
size_t filter_loc = 0;
nested_filter_map_data.resize(rep_levels.size());
for (size_t i = 0; i < rep_levels.size(); i++) {
if (i != 0 && rep_levels[i] == 0) filter_loc++;
nested_filter_map_data[i] = filter_map_data[filter_loc]; // CRASH HERE
}
}
}
for (auto v : nested_filter_map_data) assert(v == 0);
printf("All elements filtered — correct behavior\n");
return 0;
}
Search before asking
Version
4.x/master
What's Wrong?
BE crashes with SIGSEGV (null pointer dereference at 0x0) when querying Parquet-based external tables (Paimon/Hive/Iceberg) with nested type columns (Struct/Array/Map), if a predicate filters out all rows in a RowGroup.
The crash occurs in
ScalarColumnReader::gen_filter_mapwhich dereferencesfilter_map.filter_map_data()— this isnullptrwhenfilter_all=true.Root Cause:
_read_nested_columnonly checkshas_filter()but notfilter_all(). When all rows are filtered out,FilterMapis initialized viainit(nullptr, total_rows, true), setting_has_filter=truebut_filter_map_data=nullptr. The newerFilterMap::generate_nested_filter_mapalready has the correct guard (if (!has_filter() || filter_all()) return error), but the inlinegen_filter_maplacks this check.Anything Else?
No response
Are you willing to submit PR?
Code of Conduct