Skip to content

Add name_dependents reverse index for incremental invalidation#646

Merged
st0012 merged 1 commit intomainfrom
add-name-dependents-index
Mar 6, 2026
Merged

Add name_dependents reverse index for incremental invalidation#646
st0012 merged 1 commit intomainfrom
add-name-dependents-index

Conversation

@st0012
Copy link
Copy Markdown
Member

@st0012 st0012 commented Mar 5, 2026

Build a reverse index during indexing that tracks which definitions, references, and names depend on each NameId. Name-to-name edges encode the dependency type at registration time:

  • ChildName: parent_scope relationship (structural dependency)
  • NestedName: nesting relationship (reference-only dependency)
  • Definition/Reference: direct dependents of the name

This index will be consumed by the incremental invalidation engine to efficiently cascade changes without scanning the full graph.

@st0012 st0012 self-assigned this Mar 5, 2026
@st0012 st0012 force-pushed the add-name-dependents-index branch from 0993425 to 4fd9a14 Compare March 5, 2026 22:03
@st0012 st0012 marked this pull request as ready for review March 5, 2026 22:03
@st0012 st0012 requested a review from a team as a code owner March 5, 2026 22:03
Comment on lines +23 to +26
/// This name's `parent_scope` is the key name — structural dependency.
ChildName(NameId),
/// This name's `nesting` is the key name — reference-only dependency.
NestedName(NameId),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a logic need for this distinction? Or could we just merge them into Name for dependent names?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR (#641), we will have something like:

NameDependent::ChildName(id) => queue.push(InvalidationItem::UnresolveName(*id)),
NameDependent::NestedName(id) => queue.push(InvalidationItem::UnresolveReferences(*id)),

The main difference is that UnresolveReferences will only unresolve constant references:

class Foo; end

class Bar
  Foo
  class Baz; end
  # Bar has [NestedName(Foo), ChildName(Baz)]
end

When Bar's ancestors changes:

  • ChildName(Baz) will trigger a total invalidation on Baz
  • NestedName(Foo) will only invalide the Foo reference

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ancestors of Bar change, all constant references inside of the namespace have to be invalidated. I suspect that we can merge these two because the information of what needs to be invalidated is already encoded in the hashmap.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's ok, I'd like to merge this as it is, and see if the invalidation algo looks better/worse without the 2nd enum after some rounds of reviews.

}

#[cfg(test)]
mod name_dependent_tests {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed to it, but is it common in Rust to split tests groups in different modules?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. ZJIT does this as well. IMO it's a nice way to scope test helpers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad we didn't start out like that. We should've created modules for indexing each individual type of thing. Same for resolution, all ancestors tests could be separate.

Anyway, not worth the investment to refactor immediately, but something we may want later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #649 in case anyway wants to give it a try.

@st0012 st0012 force-pushed the add-name-dependents-index branch 4 times, most recently from 19b1837 to eac46bb Compare March 5, 2026 23:48
Copy link
Copy Markdown
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not 100% sure we need the two variants for names, but it looks good

@st0012 st0012 force-pushed the add-name-dependents-index branch 2 times, most recently from 3fc7d48 to 21785c9 Compare March 6, 2026 17:06
Build a reverse index during indexing that tracks which definitions,
references, and names depend on each NameId. Name-to-name edges encode
the dependency type at registration time:

- ChildName: parent_scope relationship (structural dependency)
- NestedName: nesting relationship (reference-only dependency)
- Definition/Reference: direct dependents of the name

This index will be consumed by the incremental invalidation engine to
efficiently cascade changes without scanning the full graph.
@st0012 st0012 force-pushed the add-name-dependents-index branch from 21785c9 to 7ff0dbd Compare March 6, 2026 17:16
@st0012 st0012 merged commit 74ea870 into main Mar 6, 2026
30 checks passed
@st0012 st0012 deleted the add-name-dependents-index branch March 6, 2026 18:09
for (name_id, deps) in name_dependents {
let global_deps = self.name_dependents.entry(name_id).or_default();
for dep in deps {
if !global_deps.contains(&dep) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dedup check here is O(n) per dep, making the whole merge loop O(n²) for any name with many dependents. For the initial indexing pass where we're merging many local graphs, this could be costly on large codebases.

I wonder if we should use a HashSet during the merge and then convert back to Vec, or just rely on the fact that duplicates shouldn't occur at all if the local graph was built correctly (given that a local graph only covers one file)?

If duplicates are theoretically impossible, a debug_assert here would be more appropriate.

///
/// Panics if no names match the given path.
#[must_use]
pub fn find_name_ids(&self, path: &str) -> Vec<NameId> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both GraphTest and LocalGraphTest have identical implementations of find_name_ids, name_dependents_for, name_str, and dependent_name_str. Any way we could avoid the duplication here? A shared trait or a helper module would keep these in sync automatically.

}
match parent {
None => name_ref.parent_scope().as_ref().is_none(),
Some(p) => name_ref.parent_scope().as_ref().is_some_and(|ps_id| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts:

  1. find_name_ids with a path like "Bar::Baz" only checks one level up — it verifies the immediate parent's string is "Bar", but not that Bar is itself a top-level name. So Foo::Bar::Baz would also match "Bar::Baz" since the parent's str is "Bar" regardless of where Bar lives. Is that intentional?

  2. Same question for LocalGraphTest::find_name_ids (same code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants