Nebius

Site Reliability Engineer

Remote - United States

Role brief

What this role is asking for.

About Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy. We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure. Built by engineers, for engineers. From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI. Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel. Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D. The role Nebius is looking for a Site Reliability Engineer in Hardware Infrastructure team. You’re welcome to work in our office in Amsterdam. Hardware Infrastructure team designs, develops and supports systems involved in the data-centers lifecycle: Serving functional and load testing system. Monitoring of engineering equipment located in our data centers (power supply, air and water cooling, etc.) Monitoring of IT equipment: racks, servers, JBODs, JBOGs, power shelves, network devices, etc. Asset tracking. Hardware repairs tasks tracking. Server production. In this position, your responsibility will be to : Ensure fault-tolerance, scale

Company role signals

Nebius role signals.

Repeated tags across 320 active roles show the current hiring pattern.

ML / AI · 320Support · 212Observability · 136Python · 87Sales · 84Go · 67Kubernetes · 58Security · 56Azure · 45AWS · 42