Epoch AI
Researcher, Evaluations
Remote / Full Time
Role brief
What this role is asking for.
Epoch AI is looking for a researcher to evaluate frontier AI models on hard-to-grade tasks drawn from real-world scenarios. About the role Weโre seeking a Researcher to lead a new effort evaluating how well frontier models perform on the kinds of open-ended tasks that make up real office work. You will curate a suite of realistic tasks to serve as a benchmark, design the grading rubrics for AI performance, and run newly-released models through the suite, assessing their performance both quantitatively and qualitatively. The focus is on how models handle messy, real-world work rather than on scientific knowledge or programming ability. The role makes heavy use of AI tools, but strong software engineering experience is not required. Comfort setting up AI-assisted automated workflows is a plus. If this role sounds interesting, we are also looking for researchers on multiple other teams. Applications are rolling.
Company role signals