Skip to content

fix(notebook): auto-detect Kaggle dataset mount path#17

Merged
nvandessel merged 2 commits into
mainfrom
fix/sonnet-schema-v2
Apr 3, 2026
Merged

fix(notebook): auto-detect Kaggle dataset mount path#17
nvandessel merged 2 commits into
mainfrom
fix/sonnet-schema-v2

Conversation

@nvandessel
Copy link
Copy Markdown
Owner

Summary

  • Auto-detect Kaggle dataset mount path (tries both /kaggle/input/datasets/{owner}/{slug}/ and /kaggle/input/{slug}/)
  • Add kernel-metadata.json for CLI-based notebook push via kaggle kernels push

First run failed because Kaggle changed their mount convention.

Test plan

  • 78 tests passing
  • Kaggle notebook finds data and runs end-to-end

🤖 Generated with Claude Code

Kaggle mounts datasets at /kaggle/input/datasets/{owner}/{slug}/
not /kaggle/input/{slug}/. Try both paths. Also add kernel-metadata.json
for CLI-based notebook push.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 3, 2026

Greptile Summary

This PR fixes a Kaggle dataset mount path regression by replacing the single hardcoded DATA_DIR with a two-candidate auto-detection block that tries /kaggle/input/datasets/nvandessel/floop-decisions first, then falls back to /kaggle/input/floop-decisions. It also adds kernel-metadata.json to enable CLI-based notebook publishing via kaggle kernels push.

Key changes:

  • DATA_DIR detection now raises a FileNotFoundError with a clear, actionable message (listing both checked paths) when neither candidate exists — the previous silent-fallback concern is fully addressed.
  • An additional guard raises FileNotFoundError when the resolved directory contains no .jsonl files, preventing a confusing downstream IndexError.
  • kernel-metadata.json correctly declares the dataset source (nvandessel/floop-decisions), enables GPU and internet (needed for pip install git+https://...), and points to the notebook via a relative code_file path that works when kaggle kernels push is run from notebooks/.

Confidence Score: 5/5

Safe to merge — changes are well-scoped, both failure modes are explicitly guarded, and no regressions are introduced.

All changes are targeted and correct: the auto-detection logic covers both known Kaggle mount conventions, errors are surfaced immediately with clear messages, and the metadata file is consistent with the notebook's dataset reference. No prior concerns remain unaddressed.

No files require special attention.

Important Files Changed

Filename Overview
notebooks/train-hippofloop.ipynb Replaces hardcoded DATA_DIR with a two-candidate auto-detection block; raises FileNotFoundError (with clear message listing both checked paths) when neither exists, and again when no JSONL files are found — both previous silent-failure paths are now explicitly guarded.
notebooks/kernel-metadata.json New Kaggle kernel metadata file; id, code_file, dataset_sources, and GPU/internet flags all look correct for CLI push via kaggle kernels push from the notebooks/ directory.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Cell 5: Auto-detect DATA_DIR] --> B{Check /kaggle/input/datasets/nvandessel/floop-decisions}
    B -- exists --> D[DATA_DIR = candidate 1]
    B -- not found --> C{Check /kaggle/input/floop-decisions}
    C -- exists --> E[DATA_DIR = candidate 2]
    C -- not found --> F[raise FileNotFoundError\nlist both checked paths]
    D --> G[glob *.jsonl]
    E --> G
    G -- files found --> H[Continue pipeline]
    G -- empty --> I[raise FileNotFoundError\nno .jsonl files in DATA_DIR]
Loading

Reviews (2): Last reviewed commit: "fix(notebook): fail fast when dataset pa..." | Re-trigger Greptile

Comment thread notebooks/train-hippofloop.ipynb
Raise FileNotFoundError with actionable message instead of silently
falling back to a non-existent path. Addresses Greptile review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nvandessel
Copy link
Copy Markdown
Owner Author

Addressed Greptile's P1 (silent fallback) in e02a5c8: now raises FileNotFoundError with an actionable message when neither candidate path exists, and also fails fast if the directory has no .jsonl files.

@nvandessel nvandessel merged commit 9acbe19 into main Apr 3, 2026
5 checks passed
@nvandessel nvandessel deleted the fix/sonnet-schema-v2 branch April 3, 2026 05:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant