Releases: microsoft/LakeBench
Releases · microsoft/LakeBench
v1.0.1
What's Changed
- Bump pypa/gh-action-pypi-publish from 1.4.2 to 1.13.0 in /.github/workflows by @dependabot[bot] in #72
- chore: polish for ms oss by @mwc360 in #73
- chore: note python support and add 3.13 by @mwc360 in #74
New Contributors
- @dependabot[bot] made their first contribution in #72
Full Changelog: v1.0.0...v1.0.1
v1.0.0
What's Changed
- feat: test suites, integration tests, resolve python 3.8 support by @mwc360 in #70
- Migrated build system to hatchling and UV; added uv.lock and .python-version
- Pinned fsspec==2025.2.0 to restore Python 3.8 support
- Added from future import annotations across engine and benchmark files to fix TypeError on lowercase generic
hints in Python 3.8/3.9 - Added .github/copilot-instructions.md with LLM-friendly codebase overview
- New integration test suite covering all 5 engines × 4 benchmarks (TPC-H, TPC-DS, ClickBench, ELTBench) at SF 0.1
- Added Daft and Polars support for ClickBench
- Fixed DaftELTBench: Windows path handling, when().otherwise() replacing deprecated if_else, .year() API, .collect() over .to_pandas()
- Fixed DuckDB ELTBench: .df() → .arrow() (no pandas dependency)
- Fixed Spark ELTBench CREATE OR REPLACE TABLE → DROP IF EXISTS + CREATE TABLE to avoid OverwriteByExpression truncation error
- Fixed Spark ELTBench MERGE: SQL MERGE INTO → DeltaTable.forName().merge() Python API to work around DELTA_MERGE_RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT on local delta-spark 3.2
- Fixed TPC-DS q90 divide-by-zero: added NULLIF(pmc, 0) in denominator for SF 0.1 runs
- Fixed Polars TPC-DS decimal overflow (PanicException): pre-collect cast of Decimal columns to Float64(strict=False)
- Added delta-spark==3.2.0 to the spark extra
- Auto-generated per-engine coverage reports written to reports/coverage/.md after each test session
- New baseline unit tests for path utils, query utils, and engine — matrix runs across Python 3.8–3.12
- CI: integration tests run once on Python 3.11 per engine in isolated jobs
- chore: bump version by @mwc360 in #71
Full Changelog: v0.13.3...v1.0.0
v0.13.3
v0.13.2
v0.13.1
v0.13.0
v0.12.2
v0.12.1
v0.12.0
What's Changed
- feat: make LakeBench runtime and storage backend agnostic by @keen85 and @mwc360 in #45
- feat: use tpchgen-rs for faster data gen by @mwc360 in #51
- Add README pypi metrics and status tags by @mwc360 in #41
- chore: remove unnecessary sail query variants since 0.3.4 fixes by @keen85 in #43
- bugfix: refactor generation of spark_history_url by @mwc360 in #44
- chore: upgrade sail to 0.3.7 and note full TPC-DS query support by @shehabgamin in #49
- chore: bump engine versions (DuckDB, Polars, Deltalake) by @mwc360 in #50
- feat: support Synapse Spark and HDInsight by @mwc360 in #54
- chore: bump version for release by @mwc360 in #55
New Contributors
- @shehabgamin made their first contribution in #49
⚠️ Breaking Changes
- All path related input variables have been renamed to standardize on storage backend naming (_uri). See https://github.com/mwc360/LakeBench/tree/main/examples for examples of current usage. Now that these have been unified to be object store agnostic, there shouldn't be further major changes like this across versions.
Full Changelog: v0.9.1...v0.12.0
v0.9.1
Full Changelog: v0.9.0...v0.9.1
Bugfix to generation of spark_history_url