Add Line 5 Llama results for Bolivia and Copa America#29
Open
BrunoDC-dev wants to merge 1 commit into
Open
Conversation
4dc9bbe to
ee6e711
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
S2 Line 5 Llama Extension - Bolivia and Copa America
This extends the Issue #18 / PR #22 Line 5 design to the Bolivia runoff and
Copa America cases already prepared in this branch.
Source Pattern
The imported design is the reduced S2 matrix from PR #22:
The model policy follows PR #22:
Llama 3.3 70B Instructmeta-llama/Llama-3.3-70B-Instructhttps://api.deepinfra.com/v1/openaiCases
backtesting/case-b-s2-bolivia-2025-runoff/config_line5_llama.yamlseed_T3_clean.mdbacktesting/case-d-s2-copa-america-line5-gemma/config_line5_llama.yamlseed_T3.mdFor Bolivia,
seed_T3_clean.mdkeeps the full pre-cutoff electoral evidencepackage and removes the football-noise block from
seed_T3.md.Slim Mode
The slim mode mirrors the practical setup used in PR #22 more closely: it keeps
the same R/D matrix, but uses a short fixed evidence packet and reuses one graph
project across all variants in a single runner execution.
config_line5_llama_slim.yamlseed_T3_line5_slim.mdoutput_llama_line5_slim/config_line5_llama_slim.yamlseed_T3_line5_slim.mdoutput_llama_line5_slim/Backend Requirement
The runner cannot hot-swap the model of an already-running backend. Start the
backend with the Llama/DeepInfra configuration first, for example:
Dry Run
Slim dry runs:
Execute One Smoke Variant
Slim smoke variants:
Execute Full Matrices
Slim full matrices:
Outputs are written under each case's
output_llama_line5/directory. Each runkeeps
worldbuilding_trace.json,worldbuilding_artifacts/llm_calls,simulation_config.json,run_state.json, report artifacts andeval_result.json.Slim outputs use
output_llama_line5_slim/and additionally write a_shared_graph_T3_slim.jsoncache when the shared graph build completes.Note: as in PR #22,
densityis recorded as an experimental condition. Thecurrent backend enforces the round count through
max_rounds; density is not yeta separate first-class runtime control.
Slim Results
The completed slim matrices use the same five Line 5 conditions as PR #22 and
reuse one graph build per case. All rows below include
worldbuilding_trace.json,simulation_config.json,run_state.json,state.json, report artifacts andeval_result.json.Bolivia Runoff
Ground truth:
paz_gana, with Paz 54.53%, Quiroga 45.47%, margin +9.06.llama_T3_slim_R10_D2quiroga_ganallama_T3_slim_R40_D1quiroga_ganallama_T3_slim_R40_D2quiroga_ganallama_T3_slim_R40_D3quiroga_ganallama_T3_slim_R80_D2quiroga_ganaIn this case, increasing rounds or density did not reverse the dominant polling
signal in the evidence packet. The model remained anchored on a Quiroga win even
though the post-event ground truth was a Paz win.
Copa America Final
Ground truth: Argentina won the 2024 Copa America final.
llama_T3_slim_R10_D2llama_T3_slim_R40_D1llama_T3_slim_R40_D2llama_T3_slim_R40_D3llama_T3_slim_R80_D2The Copa America matrix is almost invariant across conditions. All variants
predict Argentina with the same probability and margin, which suggests that
additional rounds do not add much when the evidence packet already contains a
strong and consistent pre-event favorite.
Cross-Case Interpretation
The IPC results from PR #22 show one meaningful depth effect:
R80-D2moved to amore explicit disinflation path while most other conditions stayed close to each
other. Bolivia and Copa America show the opposite pattern: when evidence includes
a strong prior signal, such as polls or market/model odds, the simulation tends
to preserve that signal instead of exploring a contrary outcome.
This suggests a useful follow-up experiment for the multi-agent setting: split
evidence across agents instead of giving all agents the same packet. Some agents
could receive poll/market signals, others qualitative context, institutional
risks or counterevidence. Under that setup, extra rounds may matter more because
agents would need to negotiate between partially different evidence views instead
of converging immediately on the same dominant prior.