Add design goals and north stars to Copilot instructions#656
Add design goals and north stars to Copilot instructions#656
Conversation
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
| - **Minimize user-facing APIs**: Hyrax prioritizes configuration-driven workflows over complex programmatic APIs | ||
| - **Avoid API proliferation**: Don't create new user-facing APIs that we'll need to maintain indefinitely | ||
| - **Favor declarative over imperative**: Users should configure what they want, not how to get it | ||
| - **CLI-first approach**: The `hyrax` CLI tool with verb-based commands is the primary user interface |
There was a problem hiding this comment.
No, primary interface is a Jupyter notebook; CLI is secondary.
There was a problem hiding this comment.
And essentially with the exception of notebook workflows that require the data returned from a notebook, the CLI should be able to do everything a notebook can do (provided the CLI user can read and understand files in the results directory and write small scripts)
The purpose of this is to give notebook users moving to a slurm/HPC context a CLI tool that can inherently do the same things that they did in their notebook.
There was a problem hiding this comment.
In general when we as programmers start to hit the gnarly differences between CLI and notebook execution, we seek a lot of feedback from scientists and try to enumerate every way the two could possibly work in harmony, in order to find the best fit for scientific workflows.
| ### 2. Make Easy Things Easy, Hard Things Possible | ||
| - **Default workflows should "just work"**: Common use cases should require minimal configuration | ||
| - **Progressive complexity**: Simple tasks should be simple; advanced features available when needed | ||
| - **Sensible defaults**: Default configurations in `hyrax_default_config.toml` should handle common scenarios |
There was a problem hiding this comment.
A little stronger: Every config must have a default defined (even if that default is false which is our way of saying None and that the user must define for hyrax to work) in toml.
There was a problem hiding this comment.
Pull request overview
This PR adds a "Design Goals and North Stars" section to the Copilot instructions file to document core architectural principles for contributors and automated tools.
Changes:
- Added 34-line section documenting four design principles: Low Code Interface, Make Easy Things Easy/Hard Things Possible, Support Reproducibility, and Smooth API Migration
- Positioned the new section immediately after the introduction to establish foundational context
| - **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths | ||
| - **Version pinning guidance**: Help users understand which versions work together | ||
| - **Config schema validation**: Use Pydantic schemas to validate configurations and provide helpful error messages | ||
| - **Changelog discipline**: Maintain comprehensive changelog with breaking change notifications |
There was a problem hiding this comment.
The documentation claims "Changelog discipline: Maintain comprehensive changelog with breaking change notifications" as a design goal, but there is no CHANGELOG file in the repository. This creates a discrepancy between the documented design goal and current practice. Consider either removing this bullet point or creating a CHANGELOG file to match this stated design goal.
| - **Avoid API proliferation**: Don't create new user-facing APIs that we'll need to maintain indefinitely | ||
| - **Favor declarative over imperative**: Users should configure what they want, not how to get it | ||
| - **CLI-first approach**: The `hyrax` CLI tool with verb-based commands is the primary user interface | ||
| - **Configuration over code**: Use TOML configuration files extensively to control behavior |
There was a problem hiding this comment.
Its more "Configuration" or "Code" We're pretty opinionated as a framework and try to fit the user into categories:
0) User shouldn't think about this feature at all
- User knows enough to set a config value
- User wants to write code that defines behavior
We use config to handle 0 and 1, and then are very judicious about opportunities for 2, bearing in mind that many of our users can write code, but may not be familiar with multiple file projects, classes/OOP, or any of the more complex aspects of python programming that are built upon those concepts
| - **Configuration over code**: Use TOML configuration files extensively to control behavior | ||
|
|
||
| ### 2. Make Easy Things Easy, Hard Things Possible | ||
| - **Default workflows should "just work"**: Common use cases should require minimal configuration |
There was a problem hiding this comment.
True, but also we should not omit key parameters. (e.g. user should have to define what ML model they are using and what data they are using). There is such a thing as too much magic here.
| - **Progressive complexity**: Simple tasks should be simple; advanced features available when needed | ||
| - **Sensible defaults**: Default configurations in `hyrax_default_config.toml` should handle common scenarios | ||
| - **Extensibility without complexity**: Advanced users can extend with custom models, datasets, and verbs | ||
| - **Clear extension points**: Well-documented base classes (`Verb`, model base classes, dataset classes) |
There was a problem hiding this comment.
Today "Verb" is only an internal extension point, but yes in generally we have this structure
| ### 3. Support Reproducibility | ||
| - **Configuration as documentation**: Config files serve as complete records of how experiments were run | ||
| - **Version everything**: Track model versions, data versions, and configuration versions | ||
| - **Manifest files**: Maintain manifests of downloaded data and processed results |
There was a problem hiding this comment.
This is actually an anti-pattern.
The real pattern is "Store datasets in a manner that lead to performant operations at 10M-100M scale assuming the user only really has a unix filesystem as an underlying storage layer on their HPC system"
We have fallen into manifest files because we have been writing the minimal working version, but it is arguable that we ought be using standard middleware for data storage rather than reinventing the wheel.
| - **Version everything**: Track model versions, data versions, and configuration versions | ||
| - **Manifest files**: Maintain manifests of downloaded data and processed results | ||
| - **Deterministic defaults**: Random seeds and other sources of variability should be configurable | ||
| - **MLflow integration**: Log experiments systematically for comparison and reproduction |
There was a problem hiding this comment.
Experiments should be logged systematically for comparison and reproduction. MLFlow logic is part of this system, but the results dir is the backbone upon which all other pieces rest.
| - **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths | ||
| - **Version pinning guidance**: Help users understand which versions work together | ||
| - **Config schema validation**: Use Pydantic schemas to validate configurations and provide helpful error messages | ||
| - **Changelog discipline**: Maintain comprehensive changelog with breaking change notifications |
There was a problem hiding this comment.
We don't do changelogs. We use Git.
| - **Clear deprecation warnings**: When changing APIs, provide helpful deprecation messages | ||
| - **Migration guides in documentation**: Document breaking changes with before/after examples | ||
| - **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths | ||
| - **Version pinning guidance**: Help users understand which versions work together |
There was a problem hiding this comment.
This isn't a thing afaict
|
|
||
| ### 4. Smooth and Legible Migration When APIs Change | ||
| - **Clear deprecation warnings**: When changing APIs, provide helpful deprecation messages | ||
| - **Migration guides in documentation**: Document breaking changes with before/after examples |
There was a problem hiding this comment.
We tend to prefer documenting the current thing, and then informing users of the old thing with error messages and warnings pushing them to the docs of the current thing.
| - **Migration guides in documentation**: Document breaking changes with before/after examples | ||
| - **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths | ||
| - **Version pinning guidance**: Help users understand which versions work together | ||
| - **Config schema validation**: Use Pydantic schemas to validate configurations and provide helpful error messages |
There was a problem hiding this comment.
I'm not sure this is a pillar so much as an experiment we are trying for DataProvider config specifically, because its structurally more complex than any of our other config.
Create HYRAX_GUIDE.md as the canonical shared reference, CLAUDE.md for Claude Code, and rewrite .github/copilot-instructions.md for Copilot. This deduplicates content and fixes inaccuracies identified in PRs #635, #656, and #657: Python version (>=3.11), ConfigDict (Pydantic's, not custom), verbs (internal only), primary interface (notebooks), config philosophy (three-tier "Configuration OR Code"), manifest files (compromise, not design goal), changelogs (none), Pydantic scope (data_request only), and HyraxCifarDataset spelling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change Description
Adds "Design Goals and North Stars" section to
.github/copilot-instructions.mddocumenting core architectural principles for the project.Solution Description
Added 34-line section covering four design principles:
1. Low Code Interface
2. Make Easy Things Easy, Hard Things Possible
hyrax_default_config.toml3. Support Reproducibility
4. Smooth API Migration
Positioned section immediately after introduction to establish foundational context for contributors.
Code Quality
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
gh.io/home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.