
Senior Technical Account Manager - GPU
- Causeway Bay, Hong Kong
- Permanent
- Full-time
- Visa sponsorship provided
- Build and maintain long-term technical relationships with enterprise customers, focusing on GPU performance optimization and resource allocation efficiency on AWS cloud or similar cloud services.
- Analyze customers’ current architecture, models, data pipelines, and deployment patterns; create a GPU bottleneck map and measurable KPIs (e.g., GPU utilization, throughput, P95/P99 latency, cost per unit).
- Design and optimize GPU resource usage on EC2/EKS/SageMaker or equivalent cloud compute, container, and ML services; implement node pool tiering, Karpenter/Cluster Autoscaler tuning, auto scaling, and cost governance (Savings Plans/RI/Spot/ODCR or equivalent).
- Drive GPU partitioning and multi-tenant resource sharing strategies to reduce idle resources and increase overall cluster utilization.
- Guide customers in PyTorch/TensorFlow performance tuning (DataLoader optimization, mixed precision, gradient accumulation, operator fusion, torch.compile) and inference acceleration (ONNX, TensorRT, CUDA Graphs, model compression).
- Build GPU observability and monitoring systems (nvidia-smi, CloudWatch or equivalent monitoring tools, profilers, distributed communication metrics) to align capacity planning with SLOs.
- Ensure compatibility across GPU drivers, CUDA, container runtimes, and frameworks; standardize change management and rollback processes.
- Collaborate with cloud provider internal teams and external partners (NVIDIA, ISVs) to resolve cross-domain complex issues and deliver repeatable optimization solutions.
- 5+ years in cloud technical support, solutions architecture, or customer success management, with at least 3 years of hands-on experience in GPU/accelerated computing platforms.
- In-depth understanding of GPU instance families (e.g., AWS G/P/H series) or similar offerings from other cloud providers, AMI/driver/CUDA/container compatibility management, and cloud storage/network performance tuning (e.g., S3 I/O, EBS/Instance Store equivalents, preprocessing pipelines). Proficient in scheduling GPU workloads with EKS or equivalent Kubernetes-based orchestration services, including node pool tiering, resource quotas, elastic scaling, and auto-recovery strategies. Experienced in multi-GPU/multi-node distributed computing (NCCL, topology awareness, tensor parallelism, pipeline parallelism) with expertise in communication optimization for large-scale AI training and inference.
- Skilled in PyTorch/TensorFlow performance analysis and optimization, including DataLoader tuning, mixed precision, operator fusion, and inference acceleration toolchains (ONNX, TensorRT, CUDA Graphs).
- Experienced in cost and capacity governance, familiar with Savings Plans, RI, ODCR, Spot, Capacity Blocks, and right-sizing strategies or their equivalents in other cloud platforms.
- Demonstrated cross-functional communication and influence skills, capable of driving technical solutions with data and business objectives.
- AWS Solutions Architect Professional, Machine Learning Specialty, or DevOps Professional certification or equivalent credentials from other cloud providers.
- Hands-on experience with NVIDIA ecosystem software and toolchains (CUDA/cuDNN/NCCL, TensorRT, CUDA Graphs) and proven ability to maintain performance consistency across versions and platforms.
- Delivered quantifiable performance improvements (GPU throughput, latency reduction, cost savings) with demonstrated benchmarking and regression testing methodology.
- Proven repeatable optimization results in LLM inference, batch AI training, real-time video processing, or high-performance computing (HPC).
- Contributions to open source projects (Run:ai, Ray, vLLM, DeepSpeed, Kubeflow, etc.) or published technical articles, whitepapers, or performance benchmarking.
- Experience with Infrastructure as Code (Terraform, AWS CDK **or equivalent cloud development frameworks**), Helm Charts, baseline container image management, and DevOps automation.
- Able to present performance-business tradeoffs and results to senior stakeholders using PR/FAQ documents, architecture diagrams, and capacity/cost reports.
CTgoodjobs