Explain: add annotation support by AIM289 · Pull Request #108 · undoio/addons

AIM289 · 2026-03-02T10:59:40Z

These changes add annotation support to explain.

There were a few things I had to work around here:

There may be huge numbers of annotations in the recording. The agent/LLM likely can't do anything useful with thousands of returned annotations from info annotations. explain also can't page through info annotations output like a user can, so I have added limit and offset parameters to the tool instead. I set a default limit of 200 but this is completely arbitrary and I'm open to suggestions on another number.

ugo annotation does not yet support options to disambiguate between annotations with duplicate names/details. This is being worked on as part of productising annotations, and should be added soon. For now I have had to work around it, by allowing the tool to take a bbcount instead.

The udb.annotations.get() interface has changed slightly in UDB 9.2 vs older versions. Previously the name parameter had to be a str, whereas now it can be str | None. There doesn't seem to be an easy way to check UDB version from explain, so I converted None names to "", which works with both versions. I think this and the above issue highlights that in future we may need some kind of multiple-version handling from explain, but until now things have been stable enough.

mark-undoio

I have some suggestions, nothing major.

@AIM289 how have you tested this so far?

mark-undoio · 2026-03-03T16:58:21Z

explain/explain.py

+        if bbcount is not None:
+            self.udb.time.goto(bbcount)
+            return


I'd be inclined to require a name + detail that match the bbcount passed, rather than just accepting it.

This is because, in the past, agents often get fixated on wasting time guessing bbcounts that might be interesting and jumping to them. I wouldn't be completely surprised to see an agent figure out it can do that here.

The downside to this is that it requires us to check a potentially very long list of annotations for the correct bbcount, which in Python can be slow. I think you're probably right that we need to prevent this being used as a general goto_time tool though, which is more important.

Unless we can just say not to do that in the tool description?

The downside to this is that it requires us to check a potentially very long list of annotations for the correct bbcount, which in Python can be slow.

Can we bisect on the list sensibly? (i.e. I guess I mean is the list already all available in memory and easy to jump around in)?

I think you're probably right that we need to prevent this being used as a general goto_time tool though, which is more important.

Unless we can just say not to do that in the tool description?

Prompting things like MUST NOT etc can help but I wouldn't trust an LLM not to start trying to cheat in its enthusiasm.

Yeah, it should be bisectable actually. I'll give that a go

mark-undoio · 2026-03-03T17:00:42Z

explain/instructions.md

+
+You can go to an annotation with `annotation_goto`. If there are multiple annotations with the
+required name and detail, provide the bbcount of the annotation instead. The bbcount for each
+annotation is also returned within `annotations_list`.


Might be worth mentioning that, having gone to an annotation, we'll normally be inside the code that creates with it (unless we've got a mechanism for automatically getting us back out of that and into user code?)

Not necessary for this PR but, depending on how successful the agent is at using annotations you might also need to mention it in the top-level system prompt we pass to the agent as part of its initial approach.

We do have a mechanism to reverse-finish back out of the annotation-creating code, so that's not a problem.

I'll find out how likely it is to check annotations when not specifically prompted, to decide whether we need something in the top-level prompt.

AIM289 · 2026-03-04T16:48:15Z

@AIM289 how have you tested this so far?

I've been manually testing it just by asking explain to use each of the tools on a recording with a very large number of annotations, and another recording with just a few annotations. Asking it to use a tool with specific parameters works pretty well for testing.

I also asked it to come up with a test plan and test itself which was pretty good:

Do you have any other tips for how you've tested explain tools in the past?

mark-undoio · 2026-03-04T17:05:12Z

@AIM289 how have you tested this so far?

I've been manually testing it just by asking explain to use each of the tools on a recording with a very large number of annotations, and another recording with just a few annotations. Asking it to use a tool with specific parameters works pretty well for testing.

I also asked it to come up with a test plan and test itself which was pretty good:

I like how good agents are at coming up with stuff like that! Clearly you've comprehensively tested what the individual tools do, which is good.

The other side of things is problem solving ability. Unfortunately, outside of the bug library (which is a bit tied to Gen 2 at the moment) we don't have a very good example of this.

Do you have any other tips for how you've tested explain tools in the past?

It's a bit subjective - checking that things like the Doom / cache calculate demos work without getting unreasonably distracted is probably good practice, though I can't imagine it'll be a problem.

The other thing that would be ideal is if we had a demo where there are lots of annotations and one or none (both interesting) of them is needed to fully resolve an issue. Then we'd be able to check whether it uses the tools appropriately as part of the reasoning process, rather than individually.

If that's not practical then as long as we're happy the main demos / use case don't regress we can just push this out to users and get their feedback on how well the reasoning works.

AIM289 · 2026-03-05T16:52:10Z

Will push changes soon. I have now tested with both cache and doom examples, which still worked as usual.

I also tried modifying the cache example to add annotations within the cache population loop, and explain did use them in a useful way! I included the value being cached in the annotation, and it investigated far enough to find the bad cache value, then jumped straight to the corresponding annotation instead of the usual repeated last approach.

It also already seems very keen to run tool_annotations_count as a first step regardless of recording used, so no need to insert into main prompt.

Explain: add annotation support

f34fcfa

AIM289 requested a review from mark-undoio March 2, 2026 10:59

AIM289 assigned mark-undoio Mar 2, 2026

AIM289 requested a review from barisione March 2, 2026 11:55

mark-undoio reviewed Mar 3, 2026

View reviewed changes

Alasdair Mostyn added 2 commits March 6, 2026 13:31

Explain : increase agent buffer limit to 1MB

3e54059

Explain: only accept jumping to bbcount with corresponding annotation

9c919e8

AIM289 requested a review from mark-undoio March 6, 2026 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain: add annotation support#108

Explain: add annotation support#108
AIM289 wants to merge 3 commits intomasterfrom
explain-annotations

AIM289 commented Mar 2, 2026 •

edited

Loading

Uh oh!

mark-undoio left a comment

Uh oh!

mark-undoio Mar 3, 2026

Uh oh!

AIM289 Mar 4, 2026 •

edited

Loading

Uh oh!

mark-undoio Mar 4, 2026

Uh oh!

AIM289 Mar 4, 2026

Uh oh!

mark-undoio Mar 3, 2026

Uh oh!

mark-undoio Mar 3, 2026

Uh oh!

AIM289 Mar 4, 2026

Uh oh!

AIM289 commented Mar 4, 2026

Uh oh!

mark-undoio commented Mar 4, 2026

Uh oh!

AIM289 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AIM289 commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-undoio left a comment

Choose a reason for hiding this comment

Uh oh!

mark-undoio Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

AIM289 Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mark-undoio Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

AIM289 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

mark-undoio Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mark-undoio Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

AIM289 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

AIM289 commented Mar 4, 2026

Uh oh!

mark-undoio commented Mar 4, 2026

Uh oh!

AIM289 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AIM289 commented Mar 2, 2026 •

edited

Loading

AIM289 Mar 4, 2026 •

edited

Loading