Environments for frontier models.

Reinforcement learning environments for post-training. Every step is graded against ground truth, and the reward returns to training.

The corpus as a value field.   A subject under evaluation · each square sized by capability density.
modeltask gradereward
A rollout is a loop.   Model attempts the task · graded against ground truth · reward returns to training.
Claude Opus 4.8frontier
1.00
Claude Sonnet 4.6
0.88
Claude Opus 4.7
0.84
Gemini 3.1 Pro
0.79
GPT-5.4
0.58
Nemotron 3 Nanopost-trained +11.8pp
0.42
pass@k by model · ShelfLife-41 · a bead's position is the score, 0 to 1
02Method

One process, from a capability to a graded environment.

Five steps, strung on one wire. From a loose capability to a graded reward.

Perceive

Map a capability and its failure modes until the reward is well defined.

Represent

Formalize it into a task distribution with a verifiable rubric.

Build

Stand up environments that separate cleanly from eval and resist contamination.

Scale

Mass-produce variants across the distribution. Early environments become training data.

Choose

Score pass@k by model. Point the next environment at what they fail.

03Domains

Where the method is pointed.

In priority order, by stakes.

Safety

Alignment and oversight. The first call on everything.

Defense

High-stakes capability and red-team work.

Science

Bio, pharma, research automation.

Commerce

Agentic work on real company operations. Live today.

05 · Contact

Name the capability your models miss. We build the environment, graded against ground truth.