Senior Site Reliability Engineer, DGX Cloud job at NVIDIA

Senior Site Reliability Engineer, DGX Cloud

Remote, India
This role focuses on operating and scaling high-performance DGX Cloud platforms for AI workloads across major cloud providers. Responsibilities include building and supporting large-scale Kubernetes clusters, defining and monitoring SLOs and error budgets, operating GPU workloads, improving observability, and leading incident response and root-cause analysis. The position emphasizes automation, reliability engineering best practices, and collaboration to ensure highly available, secure, and performant cloud services for enterprise and research customers.
Apply for this Position

You will be redirected to the official company website.

Similar Jobs You Might Like

Disclaimer

When applying for this position, please mention that you found this job on getoppty.

This page shows only a summarized version of the job details. For complete and accurate information, please check the official job posting using the "Apply for this Position" button.

getoppty is not responsible for any discrepancies. Always confirm the details on the actual job posting website before making any decision.