Skip to content

Explain: add annotation support#108

Open
AIM289 wants to merge 3 commits intomasterfrom
explain-annotations
Open

Explain: add annotation support#108
AIM289 wants to merge 3 commits intomasterfrom
explain-annotations

Conversation

@AIM289
Copy link

@AIM289 AIM289 commented Mar 2, 2026

These changes add annotation support to explain.

There were a few things I had to work around here:

There may be huge numbers of annotations in the recording. The agent/LLM likely can't do anything useful with thousands of returned annotations from info annotations. explain also can't page through info annotations output like a user can, so I have added limit and offset parameters to the tool instead. I set a default limit of 200 but this is completely arbitrary and I'm open to suggestions on another number.

ugo annotation does not yet support options to disambiguate between annotations with duplicate names/details. This is being worked on as part of productising annotations, and should be added soon. For now I have had to work around it, by allowing the tool to take a bbcount instead.

The udb.annotations.get() interface has changed slightly in UDB 9.2 vs older versions. Previously the name parameter had to be a str, whereas now it can be str | None. There doesn't seem to be an easy way to check UDB version from explain, so I converted None names to "", which works with both versions. I think this and the above issue highlights that in future we may need some kind of multiple-version handling from explain, but until now things have been stable enough.

@AIM289 AIM289 requested a review from mark-undoio March 2, 2026 10:59
@AIM289 AIM289 requested a review from barisione March 2, 2026 11:55
Copy link
Contributor

@mark-undoio mark-undoio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions, nothing major.

@AIM289 how have you tested this so far?

Comment on lines +833 to +835
if bbcount is not None:
self.udb.time.goto(bbcount)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to require a name + detail that match the bbcount passed, rather than just accepting it.

This is because, in the past, agents often get fixated on wasting time guessing bbcounts that might be interesting and jumping to them. I wouldn't be completely surprised to see an agent figure out it can do that here.

Copy link
Author

@AIM289 AIM289 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside to this is that it requires us to check a potentially very long list of annotations for the correct bbcount, which in Python can be slow. I think you're probably right that we need to prevent this being used as a general goto_time tool though, which is more important.

Unless we can just say not to do that in the tool description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside to this is that it requires us to check a potentially very long list of annotations for the correct bbcount, which in Python can be slow.

Can we bisect on the list sensibly? (i.e. I guess I mean is the list already all available in memory and easy to jump around in)?

I think you're probably right that we need to prevent this being used as a general goto_time tool though, which is more important.

Unless we can just say not to do that in the tool description?

Prompting things like MUST NOT etc can help but I wouldn't trust an LLM not to start trying to cheat in its enthusiasm.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be bisectable actually. I'll give that a go


You can go to an annotation with `annotation_goto`. If there are multiple annotations with the
required name and detail, provide the bbcount of the annotation instead. The bbcount for each
annotation is also returned within `annotations_list`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning that, having gone to an annotation, we'll normally be inside the code that creates with it (unless we've got a mechanism for automatically getting us back out of that and into user code?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR but, depending on how successful the agent is at using annotations you might also need to mention it in the top-level system prompt we pass to the agent as part of its initial approach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a mechanism to reverse-finish back out of the annotation-creating code, so that's not a problem.

I'll find out how likely it is to check annotations when not specifically prompted, to decide whether we need something in the top-level prompt.

@AIM289
Copy link
Author

AIM289 commented Mar 4, 2026

@AIM289 how have you tested this so far?

I've been manually testing it just by asking explain to use each of the tools on a recording with a very large number of annotations, and another recording with just a few annotations. Asking it to use a tool with specific parameters works pretty well for testing.

I also asked it to come up with a test plan and test itself which was pretty good:

image

Do you have any other tips for how you've tested explain tools in the past?

@mark-undoio
Copy link
Contributor

@AIM289 how have you tested this so far?

I've been manually testing it just by asking explain to use each of the tools on a recording with a very large number of annotations, and another recording with just a few annotations. Asking it to use a tool with specific parameters works pretty well for testing.

I also asked it to come up with a test plan and test itself which was pretty good:

I like how good agents are at coming up with stuff like that! Clearly you've comprehensively tested what the individual tools do, which is good.

The other side of things is problem solving ability. Unfortunately, outside of the bug library (which is a bit tied to Gen 2 at the moment) we don't have a very good example of this.

Do you have any other tips for how you've tested explain tools in the past?

It's a bit subjective - checking that things like the Doom / cache calculate demos work without getting unreasonably distracted is probably good practice, though I can't imagine it'll be a problem.

The other thing that would be ideal is if we had a demo where there are lots of annotations and one or none (both interesting) of them is needed to fully resolve an issue. Then we'd be able to check whether it uses the tools appropriately as part of the reasoning process, rather than individually.

If that's not practical then as long as we're happy the main demos / use case don't regress we can just push this out to users and get their feedback on how well the reasoning works.

@AIM289
Copy link
Author

AIM289 commented Mar 5, 2026

Will push changes soon. I have now tested with both cache and doom examples, which still worked as usual.

I also tried modifying the cache example to add annotations within the cache population loop, and explain did use them in a useful way! I included the value being cached in the annotation, and it investigated far enough to find the bad cache value, then jumped straight to the corresponding annotation instead of the usual repeated last approach.

It also already seems very keen to run tool_annotations_count as a first step regardless of recording used, so no need to insert into main prompt.

@AIM289 AIM289 requested a review from mark-undoio March 6, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants