Skip to content

Fix miscategorized remotes#871

Open
jdesrosiers wants to merge 2 commits intojson-schema-org:mainfrom
jdesrosiers:fix-remotes
Open

Fix miscategorized remotes#871
jdesrosiers wants to merge 2 commits intojson-schema-org:mainfrom
jdesrosiers:fix-remotes

Conversation

@jdesrosiers
Copy link
Copy Markdown
Member

Some of the optional tests are categorized in a way that makes it difficult to register remotes for a specific dialect. These remotes are used for testing references to a schema using a different dialect than the originating schema. The remotes are categorized according to the dialect of the schema. This changes it to categorized them according to the dialect of the test that uses the schema. This change allows a test harness to not have to load dialect-specific remotes used by other dialects.

@jdesrosiers jdesrosiers requested a review from a team as a code owner March 10, 2026 18:38
@jdesrosiers
Copy link
Copy Markdown
Member Author

@karenetheridge I recall that you've created a couple other remote cleanup PRs in the past, so I especially want to make sure this works with your setup.

Copy link
Copy Markdown
Member

@karenetheridge karenetheridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two files have the same $id, therefore they conflict:

  • remotes/draft2019-09/ignore-dependentRequired.json
  • tests/draft2019-09/refRemote.json

this is incorrect.. delving deeper to find the true problem...

Aha, the two conflicting files are:

This can be resolved by changing the $id of ignore-dependentRequired.json.

@karenetheridge
Copy link
Copy Markdown
Member

karenetheridge commented Mar 10, 2026

The remotes are categorized according to the dialect of the schema. This changes it to categorized them according to the dialect of the test that uses the schema. This change allows a test harness to not have to load dialect-specific remotes used by other dialects.

I understand the desire to improve efficiency by not loading schemas that aren't needed, but how can an implementation then make sure to select all the remotes that it needs for specific tests, without missing anything, while also avoiding errors from loading things that it does not understand? e.g. if an implementation supports draft2020-12 but not draft7, if a draft7-versioned remote file is placed in the 2020-12 remotes directory it will cause an error on load.

By categorizing the remotes files by their own specification version, an implementation can be sure that it is only loading files it can understand, and can avoid all the versions that it doesn't understand.

@karenetheridge
Copy link
Copy Markdown
Member

conflicting files found (see above comment)

@jdesrosiers
Copy link
Copy Markdown
Member Author

@karenetheridge

how can an implementation then make sure to select all the remotes that it needs for specific tests, without missing anything, while also avoiding errors from loading things that it does not understand?

That's exactly the problem I'm trying to solve. Right now, an implementation has to try to load every remote schema because there isn't a way to know which remotes apply to which test suite. Because we have to load all the remotes, we have to handle the case where the implementation doesn't support the dialect of the schema by either checking $schema or catching and ignoring the error. We'd still have to do that check with this change, but there would be a lot less mismatches we have to handle because the cross-draft tests are the only ones where the check might fail.

@karenetheridge
Copy link
Copy Markdown
Member

That's exactly the problem I'm trying to solve.

I'm confused, because it was already solved before. All you needed to do before is load all the remotes directories corresponding to drafts that were supported, and skip those that aren't. Now with the way you've moved things around, there's no way to skip a remotes/ schema that is using an unsupported version, because it's not categorized by its own version, but rather by the version that uses it. How is this better?

@jdesrosiers
Copy link
Copy Markdown
Member Author

Now with the way you've moved things around, there's no way to skip a remotes/ schema that is using an unsupported version

Just check $schema and skip it if it's a version you don't support. It's just as easy as checking the directory name, but with these changes it's now also possible to skip schemas that don't apply to the test suite you're loading remotes for.

@karenetheridge
Copy link
Copy Markdown
Member

That's a lot of changes to do correctly. You can't just naively look at the json data and do a match against spec metaschema $ids, because the $schema could be something else, e.g. testing a new dialect. So now the test harness needs to have a lot more insight into how the application that it's wrapping works, so as to check if it's a known metaschema or not. It breaks all encapsulation in the test harness.

Also, even if the change is less arduous, it still breaks everything. There are a non-zero number of changes required to make this work, which you are imposing on all consumers, everywhere.

You haven't answered my earlier question: what is the benefit of this? It seems to me that it simply saves the amount of time required to load a schema file that in the end isn't needed.

@jdesrosiers
Copy link
Copy Markdown
Member Author

You can't just naively look at the json data and do a match against spec metaschema $ids, because the $schema could be something else, e.g. testing a new dialect

I don't think that's an issue. For example, we test here's an example of a custom meta-schema we use for vocabulary tests.

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "http://localhost:1234/draft2020-12/metaschema-no-validation.json",
    "$vocabulary": {
        "https://json-schema.org/draft/2020-12/vocab/applicator": true,
        "https://json-schema.org/draft/2020-12/vocab/core": true
    },
    "$dynamicAnchor": "meta",
    "allOf": [
        { "$ref": "https://json-schema.org/draft/2020-12/meta/applicator" },
        { "$ref": "https://json-schema.org/draft/2020-12/meta/core" }
    ]
}

It uses the draft-2020-12 URI in $schema because it's a dialect of draft-2020-12 and used by the draft2020-12 test suite. In the case where we might be testing a completely distinct dialect, that would be its own test suite and have it's own directory representing that dialect.

Also, even if the change is less arduous, it still breaks everything. There are a non-zero number of changes required to make this work, which you are imposing on all consumers, everywhere.

It is a good point that this is a breaking change. However, it's an exaggeration to say it "breaks everything". The change affects one optional test and no required tests. There's a pretty small set of circumstances where someone might be affected. The implementation would have to be using the optional tests and not support multiple drafts. Both of which I believe are relatively rare. If someone is affected, it's trivial to update. I think that's worth it for a cleaner experience working with remotes.

You haven't answered my earlier question: what is the benefit of this? It seems to me that it simply saves the amount of time required to load a schema file that in the end isn't needed.

I have answered that question in the description and in every comment since. I don't know how else to say it. It makes it possible to load only the schemas you need for a given test suite instead of having to load every remote your implementation supports even though most of them won't end up being used. Is anything broken without this change? No. But, it's annoying and I think it's worth fixing.

@karenetheridge
Copy link
Copy Markdown
Member

karenetheridge commented Mar 16, 2026

instead of having to load every remote your implementation supports even though most of them won't end up being used

Why wouldn't they be used? Wouldn't it be expected that an implementation would test every version it supports?

The change affects one optional test and no required tests.

It doesn't really matter how many tests are affected if the change still needs to be made, does it?

I have answered that question in the description and in every comment since. I don't know how else to say it.

I should have been more specific: could you update the README file, which should clearly instruct the consumer when they should and shouldn't want to load from each directory and how to make that decision?

@jdesrosiers
Copy link
Copy Markdown
Member Author

Why wouldn't they be used? Wouldn't it be expected that an implementation would test every version it supports?

Ok, I think I'm starting to see our disconnect. The way I see it, you can't run every test suite against the same registry of remotes. The schemas that aren't dialect-specific need to be loaded for the dialect under test. When I'm testing draft-04, I need to load https://localhost:1234/integer.json as draft-04 schema. When I'm testing draft-07, I need to load it as a draft-07 schema. I can't have both at the same time and it's not good enough to use a draft-07 schema when I'm testing draft-04. Even if the results are always expected to be the same, I'm not actually testing the draft-04 if I'm using a draft-07 schema.

So, when I run the tests for a dialect, I load a fresh registry with all the dialect-agnostic remotes loaded specifically for that dialect. When I run the tests for another dialect, I start with a fresh registry so I can load all the dialect-agnostic remotes with the other dialect. From that perspective, loading every remote at the same time never makes sense. It only makes sense to load remotes that are relevant to the test suite of the dialect under test, because I'm going to have to rebuild the remote registry when I move to the next dialect test suite anyway.

could you update the README file

Yes! I didn't think about that.

@karenetheridge
Copy link
Copy Markdown
Member

When I'm testing draft-04, I need to load https://localhost:1234/integer.json as draft-04 schema. When I'm testing draft-07, I need to load it as a draft-07 schema. I can't have both at the same time and it's not good enough to use a draft-07 schema when I'm testing draft-04.

This is easy to work around by running each dialect's tests in its own process. I've always done that, because that's how my test harness does it since before I adopted it (thanks again @Relequestual ;) ). But indeed the README doesn't mention anything needing to do that.

...loading every remote at the same time never makes sense. It only makes sense to load remotes that are relevant to the test suite of the dialect under test

But we were already doing that before anyway, by virtue of having most of the remotes separated into draft-specific directories -- so you only need to load the remotes for the dialect you're testing, and the handful that are still remaining at the top level that get loaded every time.

@jdesrosiers
Copy link
Copy Markdown
Member Author

This is easy to work around by running each dialect's tests in its own process. I've always done that

I've always done that too. The point is that when you do that, you shouldn't need to load all the tests for the dialect test suite you aren't running in that process.

But we were already doing that before anyway, by virtue of having most of the remotes separated into draft-specific directories -- so you only need to load the remotes for the dialect you're testing, and the handful that are still remaining at the top level that get loaded every time.

Right. That's what I want to be able to do, but that's not good enough with how things are currently categorized. If I'm testing draft-2020-12 and I load the top level remotes and the remotes in draft2020-12, there's still one remote that didn't get loaded because it's in the draft2019-09 folder. What I'm proposing is moving that one test into draft2020-12 so I can skip the other dialect folders. Otherwise, there's no good way to know that I need to load that one schema in draft2019-09 to run the draft2020-12 test suite. The only thing I can do is load all the schemas for all the dialects in case they're needed. I don't want to have to do that because one remote is in a different folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants