Note: The job is a remote job and is open to candidates in USA. Baseten is an innovative company powering AI solutions for leading firms like Notion and OpenEvidence, and they are seeking a GPU Kernel Engineer to enhance AI model performance. This role focuses on designing high-performance GPU kernels and optimizing computation for machine learning operations, directly impacting production systems for millions of users.
Responsibilities
- Design and implement high-performance GPU kernels for key ML operations, including matrix multiplications, attention mechanisms, and mixture-of-experts routing
- Write and optimize code using CUDA, PTX assembly, and architecture-specific techniques
- Apply advanced performance optimization methods such as memory coalescing, warp-level programming, tensor core acceleration, and compute/memory overlap
- Implement cutting-edge features like quantization (FP8/FP4), sparsity, and compute/communication overlap
- Identify and resolve performance bottlenecks using tools like Nsight Systems, Nsight Compute, and Torch Profiler
- Collaborate with research teams to productionize theoretical advancements
- Contribute to internal and open-source GPU libraries
- Present technical contributions at industry conferences (e.g., NVIDIA GTC, AWS re:Invent)
Skills
- Strong understanding of GPU architecture and programming paradigms: Memory hierarchy (global, shared, registers, L1/L2 cache), Thread/block/grid organization, Synchronization techniques and race condition mitigation
- Proficient in C++ and GPU performance profiling tools
- Knowledge of: CUDA C++ API, Memory access patterns and bandwidth optimization, Numerical precision and quantization strategies, Modern GPU features (e.g., tensor cores, async operations)
- Experience with Transformer models and attention optimization (e.g., Flash Attention)
- Familiarity with GPU kernel libraries: Cutlass, Triton, Thrust, CUB
- Background in GEMM tuning and distributed/multi-GPU compute
- Contributions to open-source GPU projects
- Research publications or conference presentations on GPU performance
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Fertility and family-building stipend through Carrot
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Company Overview
Baseten is an AI infrastructure company that integrates machine learning into business operations, production, and processes. It was founded in 2019, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://www.baseten.co.Company H1B Sponsorship
Baseten has a track record of offering H1B sponsorships, with 1 in 2026, 6 in 2025, 8 in 2024, 1 in 2023, 1 in 2020. Please note that this does not guarantee sponsorship for this specific role.