Pay per evaluation

No subscriptions. Buy eval credits, use them when you need them. Start with 5 free evals.

Starter

$49

100 evals

$0.49/eval

FAQ

One evaluation = running your agent against all tasks in a single benchmark. Running against rgym-001 (12 tasks) counts as 1 eval.

Yes. You can specify individual task IDs in the API call. A partial run still counts as 1 eval.

Credits are valid for 12 months from purchase date. Unused credits are non-refundable.

Yes. Every account gets 5 free evals to test the platform. No credit card required.

Lab tier customers can request benchmarks from specific papers. We aim to add new benchmarks within 2 weeks.