Apply here. Applications are rolling, and there’s no set deadline.

About ARC Evals (now METR)

METR does empirical research to determine whether frontier AI models pose a significant threat to humanity. It’s robustly good for civilization to have a clear understanding of what types of danger AI systems pose, and know how high the risk is. You can learn more about our goals from Beth’s talk.

Some highlights of our work so far:

We’ve been mentioned by the UK government, Obama, and others. We’re sufficiently connected to relevant parties (labs, governments, and academia) that any good work we do or insights we uncover can quickly be leveraged.

About the role

The engineering lead at METR is in charge of our internal platform for evaluating model capabilities (Concretely: infrastructure to run a hundred agents in parallel against different tasks inside isolated virtual machines), as well as managing the engineers who expand this tooling.

This platform is critical to our success — as increasingly powerful models are created, we’ll need to keep pace by constructing tooling that allows us to evaluate these new models. As models gain new modalities and capabilities, the tooling necessary to test out their capabilities will shift as well.

The work is technically fascinating, and you get to be on the cutting edge of what models can do. If you’re up for it, you may also liaise with our partners — labs, the US and UK governments, etc — as they embark on their own evaluation efforts. There’s room here to help set the standards for tooling that enable evaluations overall.

Compensation is about $250k–$400k, depending on the candidate.

What we’re looking for

This role is best-suited for a generalist who enjoys wearing many hats. Former founders could be a good fit, or engineering managers who enjoy talking to users, or strong ICs or tech leads with at least a bit of management experience.