Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f925af4
feat(web): Add browser-based web UI — FastAPI backend + Next.js 16 fr…
jamesbconner Feb 22, 2026
6eb0baa
chore(web): Add package-lock.json and fix ci.yml branch triggers
jamesbconner Feb 22, 2026
78a63a3
chore(web): Add dev-web-install-npm Makefile target
jamesbconner Feb 22, 2026
21f2574
chore(web): Restore web placeholder after make clean
jamesbconner Feb 22, 2026
e654245
chore(makefile): Replace POSIX-only commands with cross-platform Pyth…
jamesbconner Feb 22, 2026
989510e
fix(web): Resolve minimatch ReDoS vulnerabilities and add autoprefixer
jamesbconner Feb 22, 2026
5dc5c5e
chore(gitignore): Ignore auto-generated web-ui/next-env.d.ts
jamesbconner Feb 22, 2026
cc9c0ad
fix(api): Remove SPA fallback route that intercepted _next/static/* a…
jamesbconner Feb 22, 2026
4c2c19d
fix(api/iceberg): Fix Windows path and catalog serialization errors
jamesbconner Feb 22, 2026
2a93fd6
fix(services/iceberg): Monkey-patch PyArrowFileIO.parse_location for …
jamesbconner Feb 22, 2026
7089d32
fix(web,services): Fix pyiceberg Windows path patch, add .pyiceberg.y…
jamesbconner Feb 22, 2026
d7d38a3
fix(api/iceberg,web): Serialize int64 snapshot IDs as strings to prev…
jamesbconner Feb 22, 2026
84412d4
fix(api/iceberg): Include @property values in _to_dict() serialization
jamesbconner Feb 22, 2026
0e7d10d
feat(web/delta): Add version detail panel for Delta Lake Analyzer
jamesbconner Feb 23, 2026
f0af078
feat(api/iceberg,web): Catalog dropdown and table listing for Iceberg…
jamesbconner Feb 23, 2026
314f2c8
feat(web-ui): align Iceberg and Delta views with tabs, schema, and da…
jamesbconner Feb 23, 2026
f7fe131
feat(web-ui): add optional snapshot ID to Iceberg loader (mirrors Del…
jamesbconner Feb 23, 2026
255dccc
feat(web,services): Add GizmoSQL snapshot/version comparison with Ice…
jamesbconner Feb 23, 2026
38d6760
feat(web): Rebuild Next.js production bundle with updated dependencies
jamesbconner Feb 23, 2026
e07f873
chore(deps): Upgrade core dependencies to latest versions
jamesbconner Feb 23, 2026
de2083a
feat(api,services,web): Add Iceberg snapshot scan statistics and impr…
jamesbconner Feb 23, 2026
aa2225d
fix(iceberg,profiling): Correct type ignore comment and Windows path …
jamesbconner Feb 23, 2026
1e594f4
fix(profiling,web): Improve file URI parsing with fallback and rebuil…
jamesbconner Feb 23, 2026
cfc5af6
refactor(api,cli,utils): Extract web directory resolution to shared u…
jamesbconner Feb 23, 2026
802e3bf
docs: Update documentation for v0.6.0 web UI release
jamesbconner Feb 23, 2026
628396d
style(api,web,tests): Fix formatting and rebuild web bundle
jamesbconner Feb 23, 2026
488ca35
chore(aws-cdk): Update GizmoSQL CLI version to v1.18.4
jamesbconner Feb 23, 2026
8edeb43
refactor(api): Extract serialization logic to shared utility module
jamesbconner Feb 23, 2026
4dbaa9f
Merge branch 'v0.6.0' of https://github.com/jamesbconner/TableSleuth …
jamesbconner Feb 23, 2026
8c9e89d
style(api,tests): Fix whitespace formatting in docstrings
jamesbconner Feb 23, 2026
ce9dca3
refactor(profiling): Extract table reference replacement logic to reu…
jamesbconner Feb 24, 2026
38ebb8a
style(profiling): Fix formatting in table replacement tests
jamesbconner Feb 24, 2026
460536f
feat(api,profiling): Add storage options support and improve error ha…
jamesbconner Feb 24, 2026
67215ac
feat(api,profiling): Improve Parquet serialization and row count dete…
jamesbconner Feb 24, 2026
8344fea
style(tests): Fix formatting in parquet serialization test
jamesbconner Feb 24, 2026
7ba99a8
feat(profiling): Add Iceberg table detection in column profiling
jamesbconner Feb 24, 2026
6f37325
feat(profiling): Remove 1B row count limit for data warehouse tables
jamesbconner Feb 24, 2026
ac3e15d
feat(profiling): Enhance Iceberg integration in profiling logic
jamesbconner Feb 24, 2026
f20306a
feat(profiling): Enhance GizmoDuckDbProfiler with additional table tr…
jamesbconner Feb 24, 2026
1026d90
refactor(tests): Improve assertions for iceberg_scan queries in profi…
jamesbconner Feb 24, 2026
a3fb1cb
refactor(tests): Simplify assertion formatting for iceberg_scan query…
jamesbconner Feb 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
run: pip install uv

- name: Install dependencies
run: uv sync --extra dev
run: uv sync --extra dev --extra web

- name: Run linter
run: uv run ruff check .
Expand Down Expand Up @@ -62,10 +62,28 @@ jobs:
with:
python-version: '3.13'

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: web-ui/package-lock.json

- name: Install uv
run: pip install uv

- name: Build package
- name: Install Node.js dependencies
run: cd web-ui && npm ci

- name: Build frontend
run: cd web-ui && npm run build

- name: Bundle frontend into package
run: |
rm -rf src/tablesleuth/web
cp -r web-ui/out src/tablesleuth/web

- name: Build Python package
run: uv build

- name: Check package
Expand Down
18 changes: 18 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,27 @@ jobs:
with:
python-version: '3.13'

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: web-ui/package-lock.json

- name: Install uv
run: pip install uv

- name: Install Node.js dependencies
run: cd web-ui && npm ci

- name: Build frontend
run: cd web-ui && npm run build

- name: Bundle frontend into package
run: |
rm -rf src/tablesleuth/web
cp -r web-ui/out src/tablesleuth/web

- name: Build package
run: uv build

Expand Down
15 changes: 13 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ dist/
downloads/
eggs/
.eggs/
lib/
lib64/
/lib/
/lib64/
parts/
sdist/
var/
Expand Down Expand Up @@ -88,3 +88,14 @@ docker-compose.override.yml
*.tar.gz
backups/
ssl/

# Web UI built artifacts (built by make build-release; placeholder index.html is committed)
# Ignore everything in web/ EXCEPT the placeholder index.html
src/tablesleuth/web/*
!src/tablesleuth/web/index.html

# Next.js dev artifacts (web-ui/ source is committed, build output is not)
web-ui/.next/
web-ui/out/
web-ui/node_modules/
web-ui/next-env.d.ts
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ repos:
- botocore>=1.34.0
- fsspec>=2023.0.0
- s3fs>=2023.0.0
- fastavro>=1.9.0
args: [--config-file=pyproject.toml, src/]
pass_filenames: false

Expand Down
74 changes: 74 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,80 @@

All notable changes to this project will be documented in this file.

## [0.6.0] - 2026-02-23

### Added

- **Browser-Based Web UI** - New `tablesleuth web` command launches a FastAPI + Next.js interface
- Full Parquet, Iceberg, Delta Lake, and GizmoSQL analysis in the browser
- Optional install: `pip install tablesleuth[web]`; requires `fastapi`, `uvicorn[standard]`, `python-multipart`, `fastavro`
- Pre-built Next.js static export bundled in the wheel via Hatchling force-include (no Node.js needed for end users)
- Configurable host/port and optional auto-browser-open; CORS origins configurable via `TABLESLEUTH_CORS_ORIGINS`
- `TABLESLEUTH_WEB_UI_DIR` env var overrides static file path for custom deployments

- **FastAPI REST Backend** (`src/tablesleuth/api/`)
- `main.py` — FastAPI app with CORS, exception handlers, health endpoint (`/api/health`), and static file mount
- `routers/parquet.py` — Parquet file inspection endpoints
- `routers/iceberg.py` — Iceberg snapshot browsing and comparison endpoints; snapshot IDs serialized as strings to prevent JavaScript integer precision loss
- `routers/delta.py` — Delta Lake version history and forensics endpoints
- `routers/config.py` — Configuration read and validation endpoints; includes `.pyiceberg.yaml` upload support
- `routers/gizmosql.py` — Column profiling and `/gizmosql/compare` snapshot comparison endpoint

- **GizmoSQL Snapshot Comparison** (`/gizmosql/compare`)
- Side-by-side Iceberg snapshot or Delta version query performance comparison
- Metadata-based scan stats sourced directly from Iceberg snapshot summary fields (`total-data-files`, `total-delete-files`, `total-records`, `total-position-deletes`, `total-equality-deletes`, `total-files-size`) via `_iceberg_snapshot_scan_stats()` — DuckDB's `EXPLAIN ANALYZE` is not reliably parseable over Arrow Flight SQL
- MOR breakdown fields on `QueryPerformanceMetrics`: `data_files_scanned`, `delete_files_scanned`, `data_rows_scanned`, `delete_rows_scanned`
- `rows_scanned` definition: `total-records + total-position-deletes + total-equality-deletes` (physical reads before merge apply)
- Web UI shows MOR sub-rows only when at least one snapshot has delete files
- Scan stats registered with `profiler.register_iceberg_scan_stats()` before query execution

- **Iceberg Metadata Patching** (`src/tablesleuth/services/iceberg_manifest_patch.py`)
- `patched_iceberg_metadata(native_table, snapshot_id)` context manager — always writes a temporary local `metadata.json` before passing a table to DuckDB
- Fixes DuckDB `current-snapshot-id` delete-file bleed: DuckDB's `iceberg_scan()` applies delete files based on the metadata's `current-snapshot-id` field, not the `version =>` argument; the patch overwrites it with the target snapshot ID
- Fixes DuckDB rejection of uppercase `"PARQUET"` format strings in delete manifest entries; re-encodes affected manifests via fastavro with lowercased value
- Handles local, S3, and `file://` URIs transparently; never yields the original path

- **API Test Suite** (`tests/api/`)
- Smoke tests for all five routers using `fastapi.testclient.TestClient`
- `test_main.py`, `test_parquet_router.py`, `test_iceberg_router.py`, `test_delta_router.py`, `test_config_router.py`, `test_gizmosql_router.py`
- Requires `--extra web` / `uv sync --extra web`

- **New Makefile Targets**
- `dev-web-install-npm` — installs Node.js dependencies in `web-ui/` (run once after checkout)
- `dev-api` — starts FastAPI with hot-reload at `localhost:8000`
- `dev-web` — starts Next.js dev server at `localhost:3000`
- `build-web` — runs `npm run build` in `web-ui/`
- `build-release` — runs `build-web` then copies `web-ui/out/` into `src/tablesleuth/web/`
- `start-web` — runs `build-release` then launches `tablesleuth web`

### Changed

- **Dependency Upgrades** — all core libraries updated to latest versions
- `pyiceberg` → 0.11.0+
- `deltalake` → 1.4.2+
- `textual` → 0.86.2+
- `pyarrow` → 23.0.0+
- `pandas` → 3.0.1+
- `ruff` → 0.14.4+
- `mypy` → 1.18.2+
- `pytest` → 8.4.2+
- **Python version range** — now `>=3.13,<3.15` (added 3.14 upper bound)
- **Default catalog** — `tablesleuth.toml` default changed from `"local"` to `"glue"` to reflect typical production usage
- **Makefile** — replaced POSIX-only shell commands with Python equivalents for cross-platform (Windows) compatibility

### Fixed

- **Windows path handling** — `iceberg_manifest_patch.py` and `api/routers/iceberg.py` now correctly normalize Windows paths before passing to PyIceberg and DuckDB
- **Iceberg catalog serialization** — fixed `api/routers/iceberg.py` errors when serializing catalog objects with non-JSON-serializable fields
- **SPA fallback route** — removed FastAPI catch-all route that was intercepting `_next/static/*` asset requests and returning 404 for frontend JS/CSS bundles
- **Snapshot ID JavaScript precision** — Iceberg snapshot IDs (int64) are now serialized as strings in API responses to prevent silent precision loss in JavaScript (`Number.MAX_SAFE_INTEGER` is 2⁵³−1)
- **PyIceberg Windows path patch** — fixed path normalization for Windows-style absolute paths in the metadata patch context manager
- **npm ReDoS vulnerability** — resolved `minimatch` ReDoS security advisory in `web-ui/` dependencies; added `autoprefixer` for CSS compatibility

### Dependencies

- **New optional group `[web]`**: `fastapi>=0.131.0`, `uvicorn[standard]>=0.32.0`, `python-multipart>=0.0.12`, `fastavro>=1.9.0`

## [0.5.3] - 2026-01-25

### Changed
Expand Down
87 changes: 75 additions & 12 deletions DEVELOPMENT_SETUP.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Table Sleuth Development Setup
# TableSleuth Development Setup

This guide covers setting up Table Sleuth for development, testing, and contributing.
This guide covers setting up TableSleuth for development, testing, and contributing.

## Prerequisites

- Python 3.13+
- `uv` package manager
- Git
- Docker (for integration tests)
- Node.js 20+ and npm (required to rebuild the web UI frontend)
- AWS CLI (for AWS-related development)

## Quick Start
Expand All @@ -17,7 +17,7 @@ This guide covers setting up Table Sleuth for development, testing, and contribu
git clone https://github.com/jamesbconner/TableSleuth.git
cd TableSleuth

# Install dependencies with dev tools
# Install dependencies with dev tools (includes web extras)
make install-dev

# Install pre-commit hooks
Expand Down Expand Up @@ -54,10 +54,20 @@ make check # Run all quality checks

### Build & Run
```bash
make build # Build distribution packages
make build # Build wheel + sdist (for releases)
make run # Run tablesleuth CLI
```

### Web UI Development (v0.6.0+)
```bash
make dev-web-install-npm # Install Node.js dependencies (run once after checkout)
make dev-api # Start FastAPI server at localhost:8000 (hot-reload)
make dev-web # Start Next.js dev server at localhost:3000 (hot-reload)
make build-web # Build Next.js static export only
make build-release # Build frontend and copy into src/tablesleuth/web/
make start-web # build-release then launch tablesleuth web
```

### Cleanup
```bash
make clean # Remove build artifacts and cache
Expand Down Expand Up @@ -149,16 +159,58 @@ gizmosql_server -U test_user -P test_password -Q \
### Running from Source

```bash
# Run from source (development mode)
python -m tablesleuth.cli inspect data/sample.parquet

# Or use the installed command
tablesleuth inspect data/sample.parquet
# Run TUI from source
uv run tablesleuth parquet data/sample.parquet
uv run tablesleuth iceberg --catalog local --table db.table
uv run tablesleuth delta path/to/table

# Run with verbose logging for debugging
tablesleuth inspect data/sample.parquet -v
uv run tablesleuth parquet data/sample.parquet -v
```

## Web UI Development

The web UI consists of a FastAPI backend and a Next.js frontend. During development you run them separately for hot-reload.

### First-Time Setup

```bash
# Install Node.js dependencies (once after checkout)
make dev-web-install-npm

# Install Python web extras
uv sync --extra web
```

### Hot-Reload Development (two terminals)

**Terminal 1 — FastAPI backend:**
```bash
make dev-api # http://localhost:8000/api/...
```

**Terminal 2 — Next.js frontend:**
```bash
make dev-web # http://localhost:3000
```

The Next.js dev server proxies API calls to `localhost:8000`. Edit files in `web-ui/src/` and Python source normally; both servers reload automatically.

### Building the Frontend for Inclusion in the Wheel

```bash
make build-release # npm run build → copies web-ui/out/ to src/tablesleuth/web/
```

This is required before `uv build` so the compiled static export is bundled in the wheel. The GitHub Actions publish workflow runs this step automatically; you only need it locally when verifying the built package or committing an updated `src/tablesleuth/web/index.html`.

### Environment Variables

| Variable | Default | Description |
|---|---|---|
| `TABLESLEUTH_WEB_UI_DIR` | package `web/` dir | Override path to static Next.js export |
| `TABLESLEUTH_CORS_ORIGINS` | `http://localhost:3000` | Comma-separated allowed CORS origins |

## Testing

### Unit Tests
Expand All @@ -174,6 +226,16 @@ pytest tests/test_parquet_service.py::test_inspect_file -v
pytest --cov=src/tablesleuth --cov-report=html --cov-report=term-missing
```

### API Tests (v0.6.0+)

```bash
# Requires web extras installed
uv sync --extra web --extra dev

# Run API smoke tests
pytest tests/api/ -v
```

### Integration Tests

```bash
Expand All @@ -183,7 +245,7 @@ export TEST_GIZMOSQL_USERNAME="test_user"
export TEST_GIZMOSQL_PASSWORD="test_password"

# Run integration tests
pytest tests/integration/ -v
pytest -m integration -v

# Run end-to-end tests
pytest tests/test_end_to_end.py -v
Expand Down Expand Up @@ -333,3 +395,4 @@ After development setup:
2. Check [ARCHITECTURE.md](docs/ARCHITECTURE.md) for system design
3. Read [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines
4. See [QUICKSTART.md](QUICKSTART.md) for usage examples
5. See [DEVELOPER_GUIDE.md](docs/DEVELOPER_GUIDE.md) for API reference and component interfaces
48 changes: 31 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: help install install-dev sync clean test test-cov lint format type-check security pre-commit check build run zip
.PHONY: help install install-dev sync clean test test-cov lint format type-check security pre-commit check build run zip dev-api dev-web dev-web-install-npm build-web build-release start-web

# Default target
help:
Expand All @@ -21,6 +21,14 @@ help:
@echo "Quality (runs all checks):"
@echo " make check Run all quality checks"
@echo ""
@echo "Web UI Development:"
@echo " make dev-api Start FastAPI dev server (localhost:8000, hot-reload)"
@echo " make dev-web-install-npm Install Node.js dependencies (run once after checkout)"
@echo " make dev-web Start Next.js dev server (localhost:3000)"
@echo " make build-web Build Next.js static export"
@echo " make build-release Build frontend and bundle into Python package"
@echo " make start-web Build release and launch web UI"
@echo ""
@echo "Build & Run:"
@echo " make build Build distribution packages"
@echo " make run Run tablesleuth CLI"
Expand Down Expand Up @@ -77,22 +85,28 @@ run:

# Create source archive (excludes .gitignore files and untracked files)
zip:
@echo "Creating source code archive..."
@VERSION=$$(grep '^version = ' pyproject.toml | cut -d'"' -f2); \
ARCHIVE_NAME="tablesleuth-$$VERSION-src.zip"; \
git archive --format=zip --prefix=tablesleuth/ -o $$ARCHIVE_NAME HEAD; \
echo "Created $$ARCHIVE_NAME (excludes .gitignore patterns)"
uv run python -c "import subprocess, tomllib, pathlib; v=tomllib.loads(pathlib.Path('pyproject.toml').read_text())['project']['version']; n=f'tablesleuth-{v}-src.zip'; subprocess.run(['git','archive','--format=zip','--prefix=tablesleuth/','-o',n,'HEAD'],check=True); print(f'Created {n} (excludes .gitignore patterns)')"

# Web UI development
dev-api:
uv run uvicorn tablesleuth.api.main:app --host localhost --port 8000 --reload

dev-web-install-npm:
cd web-ui && npm install

dev-web:
cd web-ui && npm run dev

build-web:
cd web-ui && npm run build

build-release: build-web
uv run python -c "import shutil; shutil.rmtree('src/tablesleuth/web', ignore_errors=True); shutil.copytree('web-ui/out', 'src/tablesleuth/web')"

start-web: build-release
uv run tablesleuth web

# Cleanup
clean:
rm -rf build/
rm -rf dist/
rm -rf *.egg-info
rm -rf .pytest_cache/
rm -rf .mypy_cache/
rm -rf .ruff_cache/
rm -rf htmlcov/
rm -rf .coverage
rm -rf *.zip
find . -type f -name "*.pyc" -delete
find . -type d -name __pycache__ -delete
uv run python -c "import shutil, pathlib; [shutil.rmtree(d, ignore_errors=True) for d in ['build', 'dist', '.pytest_cache', '.mypy_cache', '.ruff_cache', 'htmlcov', 'src/tablesleuth/web', 'web-ui/out', 'web-ui/.next']]; [p.unlink(missing_ok=True) for p in [*pathlib.Path('.').glob('.coverage'), *pathlib.Path('.').glob('*.zip')]]; [shutil.rmtree(p, ignore_errors=True) for p in pathlib.Path('src').glob('**/*.egg-info')]; [p.unlink() for p in pathlib.Path('.').rglob('*.pyc')]; [shutil.rmtree(p, ignore_errors=True) for p in pathlib.Path('.').rglob('__pycache__')]"
-git restore src/tablesleuth/web/index.html
Loading