HPC Infrastructure Engineer
San Francisco, CA
We are a small team with a shared belief in the future of Intelligence. At Verterra, you won't just be working on Intelligence — you'll be helping define its future.
About the role
We are looking for an HPC Infrastructure Engineer to join our team. You will engineer and design the infrastructure for our AI models.
What you'll work on
- Orchestrate 4 000-GPU-scale training jobs across national-lab supercomputers, our on-prem racks and burst-to-cloud fleets, keeping the 10PB/month data fire-hose flowing to the Foundational World Model
- Design, deploy and tune heterogeneous clusters: NVIDIA & AMD GPU nodes, high-bandwidth NVLink/InfiniBand fabrics, petabyte NVMe tiers and S3-compatible object stores
- Harden and automate the Slurm / Kubernetes / Ray scheduler stack so researchers go from commit to millions of GPU-hours with one command — no matter the substrate
- Build observability (Prometheus + Grafana) and self-healing run-books that keep our 10x faster, 40x cheaper training pipeline humming 24 / 7
- Benchmark new silicon, flash-deploy experimental drivers and collaborate with Kernel, Data and Research teams to push patches upstream
What we're looking for
- 2+ yrs operating large-scale HPC or AI clusters (>1 000 GPUs); deep Linux, networking, storage and scheduler (Slurm/PBS/K8s/Ray) expertise
- Cloud-HPC chops with AWS ParallelCluster, GCP GKE + GPU/TPU, or Azure CycleCloud; fluent in Terraform/Ansible IaC
- Performance-tuning wizardry across CUDA/HIP, RDMA, NCCL and parallel filesystems; you read nvidia-smi dmon for fun
- Proven record automating CI/CD, monitoring and cost controls for multi-tenant research workloads
- Thrives in ambiguity, moves fast, and is passionate about powering the future of intelligence—degrees optional
- We hire for ability, not credentials — if you're obsessed with pushing hardware to its limits and shaping the future of intelligence, we want to hear from you
Compensation, benefits, and perks
- Annual salary: $245K - $295K
- 401(k) plan with 6% salary matching
- Generous health, dental and vision insurance for you and your dependents
- Unlimited paid time off
- Visa sponsorship and relocation stipend to bring you to SF, if possible
- A small, fast-paced, highly focused team