Nebius

Staff Network Site Reliability Engineer

United States

Role brief

What this role is asking for.

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D. The Role We’re looking for a Network Site Reliability Engineer (NetSRE) to help build and run the fundamental part of Nebius - the Network - the infrastructure everything else depends on. This is an engineering-first SRE role: you’ll set clear reliability targets, build the tooling and automation to meet them, and make the network safer to operate as we scale quickly. Your responsibilities will include: Define and own reliability goals for network services and critical paths (SLIs/SLOs, availability targets, error budgets where it makes sense) Drive reliability improvements across the whole network: not only services, but also site readine

Company role signals

Nebius role signals.

Repeated tags across 317 active roles show the current hiring pattern.

ML / AI · 317Support · 202Observability · 133Sales · 90Python · 87Go · 72Kubernetes · 59Security · 59APIs · 51Azure · 42