Link copied to clipboard
Software Engineering

Senior Site Reliability Engineer, DGX Cloud ⚡ Urgent

NVIDIA India
Full Time 8–12 years experience 🏠 Remote
About the Role

This role focuses on operating and scaling high-performance DGX Cloud platforms for AI workloads across major cloud providers. Responsibilities include building and supporting large-scale Kubernetes clusters, defining and monitoring SLOs and error budgets, operating GPU workloads, improving observability, and leading incident response and root-cause analysis. The position emphasizes automation, reliability engineering best practices, and collaboration to ensure highly available, secure, and performant cloud services for enterprise and research customers.

You'll be redirected to the official careers portal

Similar Jobs You Might Like

QA - Consultant

KPMG company logo

KPMG

India
Quality Assurance Test Planning Test Automation Manual Testing Defect Tracking +5 more

As a QA Consultant at KPMG, you will be responsible for ensuring the quality and reliability of software solutions and deliverables across projects. Y...

Quality Engineering Full Time 3-7 years experience

Senior Analyst - Bid Writer/Proposal Writer (Bid Management)

KPMG company logo

KPMG

India
Proposal Writing Bid Management RFP/RFI Response Technical Writing Content Development +5 more

As a Senior Analyst in Bid Management at KPMG, you will be responsible for writing and managing bids and proposals for consulting engagements. You wil...

Sales & Marketing Full Time 2-5 years experience

Saviynt/SailPoint L3 Consultant

KPMG company logo

KPMG

India
Saviynt SailPoint Identity Governance (IGA) Active Directory Azure +15 more

This role requires an experienced Identity Governance (IGA) specialist to provide Level 3 support for Saviynt and SailPoint platforms in a managed ser...

Cybersecurity Full Time 6-10 years experience

CyberArk L3 Assistant Manager

KPMG company logo

KPMG

India
CyberArk Privileged Access Management CyberArk Vault PSM CPM +10 more

This role involves providing advanced Level 3 support and administration of CyberArk Privileged Access Management (PAM) solutions. You will be respons...

Cybersecurity Full Time 6-10 years experience