rdr_sk_live_7xKm2pQ9nR4tL8vB3wZ6| Eval ID | Benchmark | Status | Score | Pass/Fail | Duration |
|---|---|---|---|---|---|
| eval_7xk9m2 | Literature Survey Synthesis rgym-001 | completed | 0.847 | 10✓ 2✗ | 2m 14s |
| eval_3bR5nT | Result Interpretation rgym-003 | completed | 0.933 | 14✓ 1✗ | 1m 48s |
| eval_9pL2mK | Experiment Design rgym-002 | completed | 0.625 | 5✓ 3✗ | 3m 02s |
| eval_1qW8jF | Code-to-Paper Alignment rgym-004 | ● running | — | — | — |
| eval_5tY3vH | Novelty Detection rgym-005 | failed | — | 0✓ 6✗ | 0m 32s |