-
-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
The current backend implement convert the entire table to python list before paging, making it only useful for small/demo tables.
Here's a proposal to fix the pagination performance issue by implementing native LanceDB pagination:
Proposed Solution: Native LanceDB Pagination
Replace the current full-table scan approach with LanceDB's built-in pagination methods to read only the required rows from disk.
Implementation Changes
Current problematic code in backend/app.py:
# Lines 301-307: Loads entire table then paginates in memory
data_list = table.to_arrow().to_pylist()
total_count = len(data_list)
start_idx = offset
end_idx = min(offset + limit, total_count)
paginated_data = data_list[start_idx:end_idx]Proposed replacement:
# Use LanceDB's native take() method for pagination
try:
# Get total count efficiently without loading data
total_count = table.count_rows()
# Apply pagination at the LanceDB level
if offset >= total_count:
result_table = pa.Table.from_pylist([])
else:
# Use take() with slice for efficient pagination
indices = list(range(offset, min(offset + limit, total_count)))
result_table = table.take(indices).to_arrow()
except Exception as e:
logger.error(f"Native pagination failed for {dataset_name}: {e}")
# Fallback to current method if native pagination fails
data_list = table.to_arrow().to_pylist()
total_count = len(data_list)
start_idx = offset
end_idx = min(offset + limit, total_count)
paginated_data = data_list[start_idx:end_idx]
# Convert back to Arrow table...Key Benefits
- Memory Efficiency: Only loads requested rows into RAM instead of entire dataset
- Disk I/O Reduction: Reads only necessary data pages from storage
- Faster Response Times: Eliminates full table scan for each pagination request
- Scalability: Works efficiently with datasets containing millions of rows
Implementation Strategy
- Primary Method: Use
table.count_rows()for total count andtable.take(indices)for paginated data - Fallback: Keep current implementation as backup for compatibility with older Lance versions
- Column Filtering: Apply column selection after pagination to minimize data transfer
Frontend Compatibility
No changes needed - the frontend already sends correct pagination parameters and will benefit from faster response times.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels