jake-benchmark

Agent

Benchmark suite for testing local LLMs as AI agents via OpenClaw + Ollama. 7 models, 22 tasks, interactive dashboard...

Copy the install, test the workflow, then decide if it earns a permanent slot.

Why nowStill in play

Still active enough to matter. Good candidate for a fast stack test instead of a long evaluation loop.

DecisionKeep on the radar

Copy the install, test the workflow, then decide if it earns a permanent slot.

Trial costMedium lift

Reasonable to try, but it will take more than a quick skim to get real signal.

Risk38/100

GitHub health unknown. no security policy. 1 open issues make this testable, but not something to trust blind.

What You Are Adopting

AI Agent

OpenClaw

Model

Llama

Test This In Your Stack

One command inClean rollbackLow commitment

SandboxedInstalls to ~/.claude — isolated from your projects. One command to remove.

Fastest way to find out if jake-benchmark belongs in your setup.

Copy the install command, run a real test, and back it out cleanly if it slows you down.

Try now

git clone https://github.com/frankhli843/jake-benchmark ~/.claude/agents/jake-benchmark

Run this first. You will know quickly if the workflow earns a permanent slot.

Back out

rm -rf ~/.claude/agents/jake-benchmark

No messy cleanup loop. If it misses, remove it and keep moving.

Install Location

~/  └─ .claude/      ├─ commands/      ├─ agents/      │   └─ jake-benchmark/ ← installs here      └─ settings.json

About

Benchmark suite for testing local LLMs as AI agents via OpenClaw + Ollama. 7 models, 22 tasks, interactive dashboard with full conversation transcripts.

Open Live Project Audit Repo

Reviews0

AgingLast commit 1mo ago

1open issues

Submitted May 3, 2026

jake-benchmark

What You Are Adopting

Test This In Your Stack

About

Reviews0

auto_awesomeYour strongest next moves after jake-benchmark

jake-benchmark

What You Are Adopting

Test This In Your Stack

About

Reviews0

auto_awesomeYour strongest next moves after jake-benchmark