No subscriptions. Buy eval credits, use them when you need them. Start with 5 free evals.
One evaluation = running your agent against all tasks in a single benchmark. Running against rgym-001 (12 tasks) counts as 1 eval.
Yes. You can specify individual task IDs in the API call. A partial run still counts as 1 eval.
Credits are valid for 12 months from purchase date. Unused credits are non-refundable.
Yes. Every account gets 5 free evals to test the platform. No credit card required.
Lab tier customers can request benchmarks from specific papers. We aim to add new benchmarks within 2 weeks.