idler
Idler / 2024 - graded against ground truth Browse

Reinforcement learning environments that train frontier models, graded against ground truth.

Environments what each one is
Verifiable tasks
Each environment poses tasks that can be checked against ground truth.
Dense reward
Scored step by step, so the training signal stays informative.
Real rollouts
Grounded in real production work, not invented.
Method from a problem space to evaluating environments
Domains where we collaborate, in priority order
Safety
Alignment and oversight. Defense evals fit here. The first call on everything.
Defense
High-stakes capability and red-teaming, including weapons-capability red-teaming.
Science
Bio, pharma, clinical-trials automation, and fundamental research.
Commerce
Indexing workflows from real companies. What we are doing now.
Why Idler the neutral record
Grounded
Environments from real production work, not invented.
Neutral
A record measured the same way for every lab.
Broad
Across the problem space and its sub-spaces.
About mission and the neutral record
Mission
Train frontier models on environments built from real problem spaces, graded against ground truth.
The neutral record
A corpus measured the same way for every lab.
Team
A small team, working quietly with frontier labs.
Blog research notes and method write-ups
Note
Shelf Life
Representing a problem space in thirty pages.
Study
Environments under RL
What our environments do to models when applied with RL.
Note
Dense reward
Why step-by-step grading beats pass or fail.
Careers open roles
Collaborators
We are looking to run this process with new people. Priority: Safety, Defense, Science, Commerce.
Environment engineering
Build and scale environments across problem spaces.
Contact request access and partnerships
Request access
See the environments and what they measure.
Partnerships
Run the process together on a problem space.
Reach us
Idler Inc. / San Franciscoidler.aihi@idler.ai