Epoch AI

Researcher, Evaluations

Remote / Full Time

Role brief

What this role is asking for.

Epoch AI is looking for a researcher to evaluate frontier AI models on hard-to-grade tasks drawn from real-world scenarios. About the role We’re seeking a Researcher to lead a new effort evaluating how well frontier models perform on the kinds of open-ended tasks that make up real office work. You will curate a suite of realistic tasks to serve as a benchmark, design the grading rubrics for AI performance, and run newly-released models through the suite, assessing their performance both quantitatively and qualitatively. The focus is on how models handle messy, real-world work rather than on scientific knowledge or programming ability. The role makes heavy use of AI tools, but strong software engineering experience is not required. Comfort setting up AI-assisted automated workflows is a plus. If this role sounds interesting, we are also looking for researchers on multiple other teams. Applications are rolling.

Company role signals

Epoch AI role signals.

Repeated tags across 8 active roles show the current hiring pattern.

ML / AI · 7APIs · 1Support · 1