Skip to content

Implement base auto resolver#221

Open
lbeuk wants to merge 16 commits intoMetaculus:mainfrom
lbeuk:feat/auto-resolver
Open

Implement base auto resolver#221
lbeuk wants to merge 16 commits intoMetaculus:mainfrom
lbeuk:feat/auto-resolver

Conversation

@lbeuk
Copy link

@lbeuk lbeuk commented Mar 10, 2026

Summary

Initial implementation of auto-resolver, along with a basic tui for interacting with the resolution output. Resolver uses agents sdk and follows the process:

  • Check whether the question is presently resolvable, by first comparing the date with the resolution date in the question, and then by checking whether there are any implicit dates in the question (i.e. "will event X happen before May 1st")
  • Runs an orchestration agent that has access to a researcher and resolver subagent
  • Resolver subagent has a subagent dedicated to cancelled questions, but as mentioned still needs work

What works well:

  • Low rate of false positives/negatives
  • TUI gives overall results and allows exploring the agent output on a per question basis
  • TUI allows exporting to a shortened markdown report

What needs work:

  • Resolver struggles with cancelled resolutions, both in terms of cancelling questions that are not cancelled on Metaculus, and not cancelling questions that are cancelled on Metaculus.
  • Similarly struggles in the case of not yet resolvable questions, see second image.

Supporting evidence

The following image shows result of running on a random 60 questions from the fall aib tournament.

image

The following image shows results of running on all present questions in spring aib tournament. Note that ~67 questions were marked as not yet resolvable automatic due to the resolution date not having passed.

image

The following images depicts two instances where the auto-resolver picked up on an event that is not yet reflected on Metaculus spring aib.

image image

(Backing validation)

image

@lbeuk lbeuk changed the title Feat/auto resolver Implement base auto resolver Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant