Pay per evaluation

No subscriptions. Buy eval credits, use them when you need them. Start with 5 free evals.

Starter
$49
100 evals
$0.49/eval
  • All 5 ResearchGym benchmarks
  • JSON result export
  • 2 concurrent evals
  • Email notifications
  • 7-day result retention
Get Started
Most Popular
Pro
$199
500 evals
$0.40/eval
  • Everything in Starter
  • Webhook notifications
  • 10 concurrent evals
  • Reasoning trace export
  • 30-day result retention
  • Priority queue
Get Started
Lab
$349
1,000 evals
$0.35/eval
  • Everything in Pro
  • Custom benchmark requests
  • Unlimited concurrent evals
  • 90-day result retention
  • Dedicated Slack support
  • Team API keys
Contact Us

FAQ

What counts as one evaluation?

One evaluation = running your agent against all tasks in a single benchmark. Running against rgym-001 (12 tasks) counts as 1 eval.

Can I run a subset of tasks?

Yes. You can specify individual task IDs in the API call. A partial run still counts as 1 eval.

Do credits expire?

Credits are valid for 12 months from purchase date. Unused credits are non-refundable.

Is there a free tier?

Yes. Every account gets 5 free evals to test the platform. No credit card required.

Can I request custom benchmarks?

Lab tier customers can request benchmarks from specific papers. We aim to add new benchmarks within 2 weeks.