Skip to content

Add design goals and north stars to Copilot instructions#656

Open
Copilot wants to merge 2 commits intomainfrom
copilot/set-up-copilot-instructions
Open

Add design goals and north stars to Copilot instructions#656
Copilot wants to merge 2 commits intomainfrom
copilot/set-up-copilot-instructions

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

Change Description

Adds "Design Goals and North Stars" section to .github/copilot-instructions.md documenting core architectural principles for the project.

Solution Description

Added 34-line section covering four design principles:

1. Low Code Interface

  • Configuration-driven workflows over programmatic APIs
  • CLI-first (verb-based commands), minimal API surface
  • TOML configs control behavior

2. Make Easy Things Easy, Hard Things Possible

  • Sensible defaults in hyrax_default_config.toml
  • Progressive complexity via extensible base classes (Verb, models, datasets)

3. Support Reproducibility

  • Config files as experiment documentation
  • Version tracking (models, data, configs)
  • MLflow integration, ONNX export, manifest files

4. Smooth API Migration

  • Deprecation warnings with migration guides
  • Pydantic schema validation with clear errors
  • Changelog discipline for breaking changes

Positioned section immediately after introduction to establish foundational context for contributors.

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • gh.io
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>✨ Set up Copilot instructions</issue_title>
<issue_description>Configure instructions for this repository as documented in Best practices for Copilot coding agent in your repository.

</issue_description>

<agent_instructions>Be sure it includes a section on design goals or "north stars". For this project, these include: an emphasis on low code interface, and therefore avoiding the creation of user-facing APIs that we have to keep defending; making the easy things easy and the hard things possible; supporting reproducibility (users can return to previous runs and have a good chance of rerunning it and seeing the same answers); smooth and legible migration when APIs change.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Copilot AI changed the title [WIP] Configure Copilot instructions for repository Add design goals and north stars to Copilot instructions Jan 30, 2026
Copilot AI requested a review from gitosaurus January 30, 2026 18:37
@gitosaurus gitosaurus marked this pull request as ready for review January 30, 2026 19:17
- **Minimize user-facing APIs**: Hyrax prioritizes configuration-driven workflows over complex programmatic APIs
- **Avoid API proliferation**: Don't create new user-facing APIs that we'll need to maintain indefinitely
- **Favor declarative over imperative**: Users should configure what they want, not how to get it
- **CLI-first approach**: The `hyrax` CLI tool with verb-based commands is the primary user interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, primary interface is a Jupyter notebook; CLI is secondary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And essentially with the exception of notebook workflows that require the data returned from a notebook, the CLI should be able to do everything a notebook can do (provided the CLI user can read and understand files in the results directory and write small scripts)

The purpose of this is to give notebook users moving to a slurm/HPC context a CLI tool that can inherently do the same things that they did in their notebook.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general when we as programmers start to hit the gnarly differences between CLI and notebook execution, we seek a lot of feedback from scientists and try to enumerate every way the two could possibly work in harmony, in order to find the best fit for scientific workflows.

### 2. Make Easy Things Easy, Hard Things Possible
- **Default workflows should "just work"**: Common use cases should require minimal configuration
- **Progressive complexity**: Simple tasks should be simple; advanced features available when needed
- **Sensible defaults**: Default configurations in `hyrax_default_config.toml` should handle common scenarios
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little stronger: Every config must have a default defined (even if that default is false which is our way of saying None and that the user must define for hyrax to work) in toml.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a "Design Goals and North Stars" section to the Copilot instructions file to document core architectural principles for contributors and automated tools.

Changes:

  • Added 34-line section documenting four design principles: Low Code Interface, Make Easy Things Easy/Hard Things Possible, Support Reproducibility, and Smooth API Migration
  • Positioned the new section immediately after the introduction to establish foundational context

- **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths
- **Version pinning guidance**: Help users understand which versions work together
- **Config schema validation**: Use Pydantic schemas to validate configurations and provide helpful error messages
- **Changelog discipline**: Maintain comprehensive changelog with breaking change notifications
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation claims "Changelog discipline: Maintain comprehensive changelog with breaking change notifications" as a design goal, but there is no CHANGELOG file in the repository. This creates a discrepancy between the documented design goal and current practice. Consider either removing this bullet point or creating a CHANGELOG file to match this stated design goal.

Copilot uses AI. Check for mistakes.
- **Avoid API proliferation**: Don't create new user-facing APIs that we'll need to maintain indefinitely
- **Favor declarative over imperative**: Users should configure what they want, not how to get it
- **CLI-first approach**: The `hyrax` CLI tool with verb-based commands is the primary user interface
- **Configuration over code**: Use TOML configuration files extensively to control behavior
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its more "Configuration" or "Code" We're pretty opinionated as a framework and try to fit the user into categories:
0) User shouldn't think about this feature at all

  1. User knows enough to set a config value
  2. User wants to write code that defines behavior

We use config to handle 0 and 1, and then are very judicious about opportunities for 2, bearing in mind that many of our users can write code, but may not be familiar with multiple file projects, classes/OOP, or any of the more complex aspects of python programming that are built upon those concepts

- **Configuration over code**: Use TOML configuration files extensively to control behavior

### 2. Make Easy Things Easy, Hard Things Possible
- **Default workflows should "just work"**: Common use cases should require minimal configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but also we should not omit key parameters. (e.g. user should have to define what ML model they are using and what data they are using). There is such a thing as too much magic here.

- **Progressive complexity**: Simple tasks should be simple; advanced features available when needed
- **Sensible defaults**: Default configurations in `hyrax_default_config.toml` should handle common scenarios
- **Extensibility without complexity**: Advanced users can extend with custom models, datasets, and verbs
- **Clear extension points**: Well-documented base classes (`Verb`, model base classes, dataset classes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today "Verb" is only an internal extension point, but yes in generally we have this structure

### 3. Support Reproducibility
- **Configuration as documentation**: Config files serve as complete records of how experiments were run
- **Version everything**: Track model versions, data versions, and configuration versions
- **Manifest files**: Maintain manifests of downloaded data and processed results
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an anti-pattern.

The real pattern is "Store datasets in a manner that lead to performant operations at 10M-100M scale assuming the user only really has a unix filesystem as an underlying storage layer on their HPC system"

We have fallen into manifest files because we have been writing the minimal working version, but it is arguable that we ought be using standard middleware for data storage rather than reinventing the wheel.

- **Version everything**: Track model versions, data versions, and configuration versions
- **Manifest files**: Maintain manifests of downloaded data and processed results
- **Deterministic defaults**: Random seeds and other sources of variability should be configurable
- **MLflow integration**: Log experiments systematically for comparison and reproduction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experiments should be logged systematically for comparison and reproduction. MLFlow logic is part of this system, but the results dir is the backbone upon which all other pieces rest.

- **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths
- **Version pinning guidance**: Help users understand which versions work together
- **Config schema validation**: Use Pydantic schemas to validate configurations and provide helpful error messages
- **Changelog discipline**: Maintain comprehensive changelog with breaking change notifications
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do changelogs. We use Git.

- **Clear deprecation warnings**: When changing APIs, provide helpful deprecation messages
- **Migration guides in documentation**: Document breaking changes with before/after examples
- **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths
- **Version pinning guidance**: Help users understand which versions work together
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a thing afaict


### 4. Smooth and Legible Migration When APIs Change
- **Clear deprecation warnings**: When changing APIs, provide helpful deprecation messages
- **Migration guides in documentation**: Document breaking changes with before/after examples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to prefer documenting the current thing, and then informing users of the old thing with error messages and warnings pushing them to the docs of the current thing.

- **Migration guides in documentation**: Document breaking changes with before/after examples
- **Backward compatibility when possible**: Maintain compatibility or provide clear upgrade paths
- **Version pinning guidance**: Help users understand which versions work together
- **Config schema validation**: Use Pydantic schemas to validate configurations and provide helpful error messages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a pillar so much as an experiment we are trying for DataProvider config specifically, because its structurally more complex than any of our other config.

mtauraso added a commit that referenced this pull request Feb 5, 2026
Create HYRAX_GUIDE.md as the canonical shared reference, CLAUDE.md for
Claude Code, and rewrite .github/copilot-instructions.md for Copilot.
This deduplicates content and fixes inaccuracies identified in PRs #635,
#656, and #657: Python version (>=3.11), ConfigDict (Pydantic's, not
custom), verbs (internal only), primary interface (notebooks), config
philosophy (three-tier "Configuration OR Code"), manifest files
(compromise, not design goal), changelogs (none), Pydantic scope
(data_request only), and HyraxCifarDataset spelling.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨ Set up Copilot instructions

3 participants

Comments