ARemote Jobs Ace

PandaDoc

Senior Site Reliability Engineer

Remote (Poland)

Role brief

What this role is asking for.

Site Reliability Engineers (SREs) are essential to PandaDoc's success, ensuring customers receive a reliable service with minimal downtime. The SRE team achieves this by: Owning the incident management processes and tools. Managing the observability stack and alerting systems to enable timely investigation and mitigation. Actively contributing to service codebases to proactively prevent incidents and resolve performance bottlenecks. In essence, SREs are the cornerstone of production service resiliency, driving efforts in observability, incident management, capacity planning, and maintaining reliable operations. In this role, you will: Own and influence the incident management process end-to-end Maintain and evolve on-prem observability stack Keep production applications running smoothly by participating in the on-call rotation Develop automations and tools to support platform reliability Contribute to production services with performance and resiliency in mind Collaborate with product engineers to foster SRE principles within the R&D organization Be a mentor for the SRE team or product engineers About you: Solid programming experience, namely Python (Django and AsyncIO) and/or Java (Spring Boot) Experience in maintaining an observability tools suite (specifically, LGTM - Loki, Grafana, Tempo, Mimir) Experience in development and maintenance of Python services in production Stro

Company role signals

PandaDoc role signals.

Repeated tags across 50 active roles show the current hiring pattern.