Skip to content

Conversation

@wilfreddenton
Copy link
Contributor

#2418

There are two strategies I've explored for giving consumers access to source locations:

  1. Make spans available on resolved items so consumers can inspect them after resolution. There are two ways I found to do this

    1. Spans on structs (wit-parser-spans-in-structs) - Add span: Option<Span> directly to TypeDef, Function, Interface, World
      • Pros: Direct access pattern; spans travel with the data automatically during merges
      • Cons: Breaking change to public struct layouts
    2. Spans in HashMaps (wit-parser-spans-in-resolve) - Store spans in HashMap<Id, Span> fields on Resolve
      • Pros: No changes to existing struct layouts (Resolve derives Default so not likely to break consumers)
      • Cons: Indirect lookup; requires explicit remapping during merges
  2. Hook into internal validation (this PR)

    Keep spans private but let consumers hook into the resolution process via a Validator trait with callbacks.

    • Pros: No breaking changes; opt-in complexity; zero cost for users who don't need spans
    • Cons: Validation must happen during push_* calls, not after-the-fact inspection

This PR implements Strategy 2 because it avoids breaking changes and has zero cost for existing. However, I'm happy to go with one of the other strategies or something totally new that I haven't considered.

Here's an example of how a consumer would use this new features:

use wit_parser::{Error, Resolve, Span, Validator};

struct NoEmptyRecords;

impl Validator for NoEmptyRecords {
    fn validate_type(&mut self, resolve: &Resolve, id: TypeId, span: Span) -> Result<(), Error> {
        let ty = &resolve.types[id];
        if let TypeDefKind::Record(r) = &ty.kind {
            if r.fields.is_empty() {
                return Err(Error::new(span, "empty records are not allowed"));
            }
        }
        Ok(())
    }
}

let mut resolve = Resolve::default();
let mut validator = NoEmptyRecords;
resolve.push_source_with_validator("test.wit", source, &mut validator)?;

@wilfreddenton wilfreddenton requested a review from a team as a code owner January 16, 2026 16:27
@wilfreddenton wilfreddenton requested review from fitzgen and removed request for a team January 16, 2026 16:27
@wilfreddenton wilfreddenton changed the title Wit parser validation wit-parser: Add validation hooks for custom linting Jan 16, 2026
@wilfreddenton wilfreddenton marked this pull request as draft January 16, 2026 16:31
@fitzgen fitzgen requested review from pchickey and removed request for fitzgen January 16, 2026 19:29
@fitzgen
Copy link
Member

fitzgen commented Jan 16, 2026

Redirecting review because I'm not familiar with wit-parser

@wilfreddenton wilfreddenton changed the title wit-parser: Add validation hooks for custom linting wit-parser: Add validation hooks for custom linting Jan 20, 2026
@wilfreddenton wilfreddenton marked this pull request as ready for review January 23, 2026 14:08
@alexcrichton alexcrichton requested review from alexcrichton and removed request for pchickey January 23, 2026 17:32
@alexcrichton
Copy link
Member

Apologies for the delay in reviewing this, it's been a busy week!

This is a conceptually large enough change that the design in this PR isn't a slam-dunk to me just yet, and it might still be worth considering some of the other options that you laid out. Could you describe your end goal with this a bit more to help shape the thinking there? For example what are you looking to do with spans after a Resolve is created?

@wilfreddenton
Copy link
Contributor Author

wilfreddenton commented Jan 23, 2026

No problem! Thanks for the response. I totally understand wanting to consider other options that's why I tried to explore and present a few. Not sure if you clicked the links but they lead to diffs with all the changes made for that strategy.

The wit-parser itself performs "validation" already on a wit file. Essentially just confirming that the file is written correctly in the writ "language". However a project may want to assert additional validation rules i.e. exports with a specific name must have this type signature, exports can only accept and return a certain subset of types, records and enums cannot be recursive, etc. This is currently possible to do by iterating through the various properties of the Resolve and testing rules agianst them. However, because the Span type is not public and the instances do not exist on the Resolve or its items, line-based location information cannot be included in invalidated rule error messages. The best thing I've been able to do is "... in the my-type record definition" which is OK but not ideal.

So the first strategy is about making Spans available to consuming code. It can be done by adding Span maps to the Resolve or by adding Span properties to all the Resolve items. They are both technically breaking changes but adding Span properties to all the items breaks more things because they don't derive Default; however, it is the more ergonomic of the two and does not create complications in the merge process.

The second strategy, the PR, is allowing consuming code to hook into the resolution process to add additional validation rules at a point in the lifecycle where the Spans exist. So only the Span type needs to be made public and the instances can continue to be consumed by resolution. This is purely additive and has no breaking changes but could be perceived as a more "invasive" change.

@tschneidereit
Copy link
Member

FWIW, I'm working on something for which I'd like to have source span information available as well. Not for validation, but to enable jumping to the line an item is defined in.

More generally, making spans available seems like the strictly more expressive change, which might be worth considering.

@alexcrichton
Copy link
Member

Ok thanks for the info @wilfreddenton, that helps me at least!

Given your use case, what @tschneidereit is describing, and my own experience, I might actually lean more towards including spans on items themselves rather than having side tables or just one pass where the spans are available. I think that'll be the most flexible in terms of what consumers can do with the information.

I agree though it's not an easy change, so I think it'd be worth to game it out first before fully committing to ensure it's reasonable. For the breaking change aspect that's ok, this crate has a new major version on all releases intentionally to enable this sort of development. For Default, going this route would I believe basically require that impl to exist. I think the most appropriate default value would be some sort of sentinel for "not present".

@wilfreddenton with those two problems in theory resolved, how would you feel about the span-in-item route?

@wilfreddenton
Copy link
Contributor Author

Yes happy to go with the span-in-item route. My main concern with it was breaking changes. The strategy is pretty well explored here: main...wilfreddenton:wasm-tools:wit-parser-spans-in-structs

For Default, going this route would I believe basically require that impl to exist.

Are you saying that you want Defaults on all the item structs or just on the Span. Currently I have the new span property as Option<Span>. It seems like you would rather have span: Span and then have some default like Span { start: 0, end: 0, source_map: u32::MAX }? Would you mind explaining your reasoning here?

If we want Defaults on all the items then there's a few property types where the defaults are unclear:

  1. TypeOwner could default toTypeOwner::None
  2. TypeDefKind could default to TypeDefKind::Unknown (or maybe Type(Type::Bool)?)
  3. FunctionKind coulde default to FunctionKind::Freestanding

Would you please provide a quick initial pass of feedback on that diff above? Then I can make the updates and either force push those commits into this PR or close this one and make a new one, whichever you prefer.

@alexcrichton
Copy link
Member

Apologies been pretty busy the past week...

I think I misinterpreted what you were saying w.r.t. Default, sorry about that. I thought that meant we had a lot of Default impls which would all break because Span didn't implement Default, so I was thinking that it'd be ok if Span implemented Default with some sort of placeholder. Looking at the diff though that's not the case, however, and having Option<Span> looks reasonable enough.

Feel free to push those commits over to this branch here, that's ok to update the PR title/description/etc in-place. Reading over it my main thought is that ideally the new source_map field would be avoided in Span. It's a pretty pervasive data structure and inflating its size can have performance implications, so it'd ideally be kept as small as is reasonably possible. This naively to me (without having dug in too deeply so feel free to push back) solvable by having a single SourceMap for a Resolve rather than a list of source maps. As packages are pushed in the source map would get mutated to have more and more source files and the span start/end would get updated but otherwise could keep the same idea of start/end being offsets into one large SourceMap. This is sort of how multi-file-package-parses already work, so there might be some helper to extract and share with the AST parsing too.

@wilfreddenton
Copy link
Contributor Author

Ok this PR has been updated to the spans in items approach. I've adjusted it to use a single source map instead of a vec of them which removes the source_map property from Span. I also added spans to Field, Flag , Case, and EnumCase.

Next, I'd like to add spans to function parameters. Right now they're modeled as Vec<(String, Type)> so we could add another item to the tuple like Vec<(String, Type, Option<Span>)> or make a new record type:

pub struct Param {
    pub name: String,
    pub ty: Type,
    pub span: Option<Span>,
}

pub struct Function {
    // ...
    pub params: Vec<Param>,
    // ...
}

Thinking the Param record approach is better but want to get your input first.

There are some other public types that I think could have a span on them i.e. Package but wasn't sure how to do it and if it's even useful. Let me know if you think spans are missing anywhere.

@alexcrichton
Copy link
Member

Agreed yeah Param is the way to go instead of a triple, but if you're ok with it I'd prefer to defer that to a subsequent PR rather than throwing it in here as well. I also think it's fine to add more spans in more places as needed, it's sort of demand-driven currently and I haven't ever done a pass historically at trying to ensure that everything has a span. If you don't have a desire/need for a span in any particular AST construct feel free to leave as-is, should be pretty easy to add spans retroactively in the future.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor thoughts, but otherwise looks great. Thanks again!

Comment on lines 361 to 366
fn push_with_span_offset(
&mut self,
unresolved: UnresolvedPackage,
source_map: &SourceMap,
span_offset: u32,
) -> Result<PackageId> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically I've tried to make sure that all the methods of Resolve trickle down into each other without branching out into private helpers because there's so many ways to push I find it easy to get lost if there's branches to where things end up (vs "everything goes through here and it's public too"). Given that I'd ideally prefer to avoid this new method which modifies push effectively.

How about a refactoring which more officially supports external SourceMaps? There could be a push_source_map method which takes SourceMap and returns u32, and then the push method takes a span_offset: u32 instead of &SourceMap. That would more easily fit into sort_unresolved_packages I think and also expose this infrastructural change to API users because otherwise push becomes basically buggy to use if anything else was pushed previously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I believe I've addressed this but let me know if I misinterpreted you. A consequence of this change is rewrite_error also needs to take the span_offset and adjust the spans of the errors.

@wilfreddenton
Copy link
Contributor Author

Yes happy to do the Param spans in a follow up PR.

Not sure if you noticed but I just put the tests related to these changes here: https://github.com/bytecodealliance/wasm-tools/pull/2419/changes#diff-736cf01f18df117b373723a19f9594dce78b8cdaefe6192630fb893151c61244R4502

Not sure if this is the right place for them and/or if you prefer they be written in a different style. I'm just inlining the WIT test cases. I know there's a bunch of WIT fixtures in the tests folder. Let me know if you'd like any changes here and/or if the added coverage is insufficient please.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are perfect as-is yeah, thanks for adding them! They're dealing with some pretty specific details so it's fine to not try to shove everything under tests/cli/*

self.assert_valid();
}
ret
self.source_map.rewrite_error(span_offset, || ret)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that I think is a bit tricky because this means that errors only get properly rewritten if the pre-adjusted Span is used instead of the post-adjusted Span. Would it be possible to do a quick pass over the UnresolvedPackage to update all spans immediately (maybe even a method on that? maybe here as the first part of push? either way). That way this wouldn't have to make its way to rewrite_error and the only spans that's dealing with are post-processed Spans that have already been adjusted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok after making this change I can see now that we have duplicate spans. We have the "parallel vec spans" i.e. interface_spans and type_spans and then Interface.span and TypeDef.span as well as others. I think maybe in a followup PR it makes sense to remove some of this duplication?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants