ARemote Jobs Ace

Nebius

Senior Site Reliability Engineer — Token Factory (Inference Platform)

Amsterdam, Netherlands; Berlin, Germany; London, United Kingdom; Prague, Czech Republic; Remote - Europe

Role brief

What this role is asking for.

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D. Token Factory is a part of Nebius Cloud , one of the world’s largest GPU clouds, running tens of thousands of GPUs. We are building an inference platform that makes every kind of foundation model — text, vision, audio, and emerging multimodal architectures — fast, reliable, and effortless to deploy at massive scale. To deliver on that promise, we need an engineer who can make the platform behave flawlessly under extreme load and recover gracefully when the unexpected happens. In this role you will own the reliability, performance, and observability of the entire inference stack. Your day starts with designing and refining telemetry pipelin

Company role signals

Nebius role signals.

Repeated tags across 320 active roles show the current hiring pattern.