Benchmark suite for testing local LLMs as AI agents via OpenClaw + Ollama. 7 models, 22 tasks, interactive dashboard...
Copy the install, test the workflow, then decide if it earns a permanent slot.
Still active enough to matter. Good candidate for a fast stack test instead of a long evaluation loop.
Copy the install, test the workflow, then decide if it earns a permanent slot.
Reasonable to try, but it will take more than a quick skim to get real signal.
GitHub health unknown. no security policy. 1 open issues make this testable, but not something to trust blind.
AI Agent
OpenClaw
Model
Llama
Fastest way to find out if jake-benchmark belongs in your setup.
Copy the install command, run a real test, and back it out cleanly if it slows you down.
git clone https://github.com/frankhli843/jake-benchmark ~/.claude/agents/jake-benchmarkRun this first. You will know quickly if the workflow earns a permanent slot.
rm -rf ~/.claude/agents/jake-benchmarkNo messy cleanup loop. If it misses, remove it and keep moving.
Install Location
~/ └─ .claude/ ├─ commands/ ├─ agents/ │ └─ jake-benchmark/ ← installs here └─ settings.json
Benchmark suite for testing local LLMs as AI agents via OpenClaw + Ollama. 7 models, 22 tasks, interactive dashboard with full conversation transcripts.