[fix](be) Revert row id read batching#63544
Open
BiteTheDDDDt wants to merge 1 commit into
Open
Conversation
### What problem does this PR solve? Issue Number: None Related PR: apache#63436 Problem Summary: Revert the row id fetch batching change from apache#63436 because recent profiling shows the row id read path spending significant CPU in nullable null-map RLE decoding and bit readers after the batching change. The reverted change groups sparse row ids by segment and reads them in larger sorted batches, which can make FileColumnIterator::read_by_rowids advance through large null-map ranges for sparse nullable columns. Restore the previous per-row and adjacent-file batching behavior while the sparse nullable row id access pattern is investigated. ### Release note None ### Check List (For Author) - Test: Manual test - build-support/check-format.sh be/src/exec/rowid_fetcher.cpp be/src/service/point_query_executor.cpp be/src/storage/segment/segment.cpp be/src/storage/segment/segment.h - ninja -C be/build_Release src/exec/CMakeFiles/Exec.dir/rowid_fetcher.cpp.o src/service/CMakeFiles/Service.dir/point_query_executor.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment.cpp.o - Behavior changed: Yes. Restore the row id fetch behavior before apache#63436. - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 31320 ms |
Contributor
TPC-H: Total hot run time: 30647 ms |
Contributor
TPC-DS: Total hot run time: 169473 ms |
Contributor
TPC-DS: Total hot run time: 169051 ms |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: None
Related PR: #63436
Problem Summary: Revert #63436 because recent profiling shows the row id read path spending significant CPU in
RleDecoder<bool>::GetNextRun,BitReader::GetValue/GetVlqInt,FileColumnIterator::read_by_rowids,BitShufflePageDecoder::read_by_rowids, and LZ4 decompression. The reverted change groups sparse row ids by segment and reads them in larger sorted batches. For sparse nullable columns this can makeFileColumnIterator::read_by_rowidsadvance through large null-map ranges and spend more CPU in RLE/bit decoding. Restore the previous per-row and adjacent-file batching behavior while the sparse nullable row id access pattern is investigated.Release note
None
Check List (For Author)
build-support/check-format.sh be/src/exec/rowid_fetcher.cpp be/src/service/point_query_executor.cpp be/src/storage/segment/segment.cpp be/src/storage/segment/segment.hninja -C be/build_Release src/exec/CMakeFiles/Exec.dir/rowid_fetcher.cpp.o src/service/CMakeFiles/Service.dir/point_query_executor.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment.cpp.o