[Benchmark Output Submission]: Juris

### Agent Name

Juris

### Maintainer

Animesh Mishra / amethystani

### Model(s) Used

Qwen/Qwen3.6-35B-A3B (served locally via Ollama as `qwen3.6:35b-a3b`)

### Agent Description

Juris is a custom capability-2 VAKRA agent built around Qwen 3.6 with a deterministic routing layer for dashboard-API tool selection. The agent is optimized for the released capability 2 workload and focuses on selecting the single best API for each query, normalizing tool arguments, and shaping final answers into the expected benchmark format without free-form reasoning drift.

The run used a hybrid setup: lexical and semantic tool shortlisting, deterministic overrides for failure-prone query classes, and a constrained one-tool execution loop for direct-answer dashboard tasks. The goal was to reduce avoidable tool-selection errors and time lost in generic ReAct-style loops on large tool inventories.

### Metadata (JSON)

```json
{
  "submitter": "Animesh Mishra / amethystani",
  "agent_name": "Juris",
  "benchmark": "VAKRA",
  "capability": "capability_2_dashboard_apis",
  "coverage": {
    "domains_covered": 17,
    "domains_total": 17,
    "queries_total": 1597
  },
  "model": {
    "hf_model": "Qwen/Qwen3.6-35B-A3B",
    "provider": "ollama",
    "local_tag": "qwen3.6:35b-a3b"
  },
  "hardware": {
    "gpu": "NVIDIA RTX 4500 Ada Generation",
    "gpu_memory_mib": 24570
  },
  "run_metrics": {
    "successful_executions": 1517,
    "failed_executions": 80,
    "completion_rate_percent": 94.99,
    "overall_avg_duration_s": 62.89,
    "overall_max_duration_s": 300.01
  },
  "failure_breakdown": {
    "Agent timed out after 300 seconds": 75,
    "Recursion limit of 3 reached without hitting a stop condition": 5
  },
  "validation": {
    "status": "passed",
    "command": ".venv/bin/python validate_output.py output/capability_2_full_qwen36_v2",
    "files_validated": 17
  },
  "code_or_system_link": "https://github.com/amethystani/juris-vakra-cap2-submission",
  "notes": "This submission covers the full released domain set for capability 2, in line with the current maintainer guidance for capability-level scoring."
}
```

### ZIP File Link

https://github.com/amethystani/juris-vakra-cap2-submission/releases/download/v1/juris_vakra_capability2_submission_20260504.zip

### ZIP Contents Description

- `capability_2_dashboard_apis/prediction/*.json` for all 17 capability-2 domains
- `SUBMISSION_MANIFEST.json` with run metadata and per-domain counts
- `VALIDATION_SUMMARY.txt` with the official output-format validation result

### Validation Checklist

- [x] JSON files are valid and well-formed
- [x] ZIP file is accessible via the provided link
- [x] No sensitive or PII data included
- [x] Agent has been tested locally

### Additional Notes

This is the formal benchmark submission for the approach I referenced earlier in issue #15. I am submitting capability 2 across all of its released domains, rather than a single-domain sample, so the maintainers can evaluate it at the capability level. If you need any additional run details or a slightly different artifact layout for ingestion, I can provide that quickly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark Output Submission]: Juris #16

Agent Name

Maintainer

Model(s) Used

Agent Description

Metadata (JSON)

ZIP File Link

ZIP Contents Description

Validation Checklist

Additional Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Benchmark Output Submission]: Juris #16

Description

Agent Name

Maintainer

Model(s) Used

Agent Description

Metadata (JSON)

ZIP File Link

ZIP Contents Description

Validation Checklist

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions