Senior Site Reliability Engineer - HPC — Nvidia | cvGO!
Nvidia · 3 Locations · Office
### About the Role Senior SRE for HPC at NVIDIA. Ensure reliability, scalability, and performance of infrastructure supporting AI and HPC workloads. ### Responsibilities - Design and implement fault-tolerant systems for HPC clusters and GPU infrastructure. - Automate deployment, monitoring, and reco