Conversation
mark-undoio
left a comment
There was a problem hiding this comment.
I have some suggestions, nothing major.
@AIM289 how have you tested this so far?
explain/explain.py
Outdated
| if bbcount is not None: | ||
| self.udb.time.goto(bbcount) | ||
| return |
There was a problem hiding this comment.
I'd be inclined to require a name + detail that match the bbcount passed, rather than just accepting it.
This is because, in the past, agents often get fixated on wasting time guessing bbcounts that might be interesting and jumping to them. I wouldn't be completely surprised to see an agent figure out it can do that here.
There was a problem hiding this comment.
The downside to this is that it requires us to check a potentially very long list of annotations for the correct bbcount, which in Python can be slow. I think you're probably right that we need to prevent this being used as a general goto_time tool though, which is more important.
Unless we can just say not to do that in the tool description?
There was a problem hiding this comment.
The downside to this is that it requires us to check a potentially very long list of annotations for the correct bbcount, which in Python can be slow.
Can we bisect on the list sensibly? (i.e. I guess I mean is the list already all available in memory and easy to jump around in)?
I think you're probably right that we need to prevent this being used as a general
goto_timetool though, which is more important.Unless we can just say not to do that in the tool description?
Prompting things like MUST NOT etc can help but I wouldn't trust an LLM not to start trying to cheat in its enthusiasm.
There was a problem hiding this comment.
Yeah, it should be bisectable actually. I'll give that a go
|
|
||
| You can go to an annotation with `annotation_goto`. If there are multiple annotations with the | ||
| required name and detail, provide the bbcount of the annotation instead. The bbcount for each | ||
| annotation is also returned within `annotations_list`. |
There was a problem hiding this comment.
Might be worth mentioning that, having gone to an annotation, we'll normally be inside the code that creates with it (unless we've got a mechanism for automatically getting us back out of that and into user code?)
There was a problem hiding this comment.
Not necessary for this PR but, depending on how successful the agent is at using annotations you might also need to mention it in the top-level system prompt we pass to the agent as part of its initial approach.
There was a problem hiding this comment.
We do have a mechanism to reverse-finish back out of the annotation-creating code, so that's not a problem.
I'll find out how likely it is to check annotations when not specifically prompted, to decide whether we need something in the top-level prompt.
I've been manually testing it just by asking I also asked it to come up with a test plan and test itself which was pretty good:
Do you have any other tips for how you've tested |
I like how good agents are at coming up with stuff like that! Clearly you've comprehensively tested what the individual tools do, which is good. The other side of things is problem solving ability. Unfortunately, outside of the bug library (which is a bit tied to Gen 2 at the moment) we don't have a very good example of this.
It's a bit subjective - checking that things like the Doom / cache calculate demos work without getting unreasonably distracted is probably good practice, though I can't imagine it'll be a problem. The other thing that would be ideal is if we had a demo where there are lots of annotations and one or none (both interesting) of them is needed to fully resolve an issue. Then we'd be able to check whether it uses the tools appropriately as part of the reasoning process, rather than individually. If that's not practical then as long as we're happy the main demos / use case don't regress we can just push this out to users and get their feedback on how well the reasoning works. |
|
Will push changes soon. I have now tested with both cache and doom examples, which still worked as usual. I also tried modifying the cache example to add annotations within the cache population loop, and It also already seems very keen to run |

These changes add annotation support to
explain.There were a few things I had to work around here:
There may be huge numbers of annotations in the recording. The agent/LLM likely can't do anything useful with thousands of returned annotations from
info annotations.explainalso can't page throughinfo annotationsoutput like a user can, so I have addedlimitandoffsetparameters to the tool instead. I set a default limit of 200 but this is completely arbitrary and I'm open to suggestions on another number.ugo annotationdoes not yet support options to disambiguate between annotations with duplicate names/details. This is being worked on as part of productising annotations, and should be added soon. For now I have had to work around it, by allowing the tool to take abbcountinstead.The
udb.annotations.get()interface has changed slightly in UDB 9.2 vs older versions. Previously thenameparameter had to be astr, whereas now it can bestr | None. There doesn't seem to be an easy way to check UDB version fromexplain, so I convertedNonenames to"", which works with both versions. I think this and the above issue highlights that in future we may need some kind of multiple-version handling fromexplain, but until now things have been stable enough.