Oct 19, 2024

ML Compiler Job Position (Oct 19, 2024)

ML Compilerに必要な学位・知識・経験をJob Positionから考える。 2024/10/19にキーワード「Machine Learning Compiler」でGoogleとGlassdoorで検索した結果からピックアップしたので、網羅的に調べたわけではない。末尾にJob Description、ResponsibilitiesとQualificationsをコピーしておく。

Job Openings

closeしたものも含めて76 positionが見つかった。分類と企業名↓

自動車メーカ
- Tesla
- Rivian Automotive
- Waymo
ブロックチェーン
- Gensyn
半導体・AIチップベンダ
- Ampere Computing
- dMatrix
- Groq
- NextSilicon
- Qualcomm
- Renesas
- Samsung Semiconductor
- SiMa.ai
- Synopsys
- Tenstorrent
- Untether AI
SaaS
- Lightning AI
- OpenAI
Bit tech
- Amazon
- Google
- Meta
- Microsoft
- NVIDIA

分類別のJob position

自動車メーカ：7 (Tesla 2、Rivian Automotive 1、Waymo 4)
半導体・AIチップベンダ：36 (AMD 17、Ampere Computing 1、dMatrix 1、Groq 1、NextSilicon 1、Qualcomm 、3、Renesas 2、Samsung 1、SiMa.ai 1、Synopsys 2、Tenstorrent 4、Untether AI 2)
SaaS：2 (Lightning AI 1、OpenAI 1)
Big tech：30 (Amazon 9、Google 5、Meta 9、Microsoft 6、NVIDIA 1)

やはり半導体・AIチップベンダが多い。ただAMDが47%(17/36)を占めており、単一の企業では全体を見てもトップ。 Big techも結構多い。 Amazonの求人は、2015年に買収したイスラエルの半導体開発企業Annapurna Labsからのみ。

Qualifications

学位
- Computer Science、Computer Engineering、関連する分野のBachelor’s degreeはminimum
- MasterやPh.D.もpreferredとされているpositionも多い
プログラミング言語
- PythonとC++は必須
- アセンブリに言及があるpositionもある
ML models
- CNN, LSTM, Diffusion, Transformers, image models, recommendation systems
- LLMs
  - LLama, Mixtral, Gemma
Hardware Architecture
- GPUs, TPUs, NPUs, and VLIW processors
Deep learning framework
- PyTorchの言及が一番多い
- ONNXもそれなりにある
- この辺は、training phaseも含む/inferenceに特化、userは外部/internalのみで変わってきそう
Deep learning compiler
- 言及のあるposition数
  - MLIR: 38
  - TVM: 12
  - OpenXLA: 11
  - Triton: 11
  - IREE: 6
- やはりMLIRがトレンドと言えそう
Algorithm
- Model partitioning (pipelined, tensor, model and data parallelism), tiling, resource allocation, memory management, scheduling and optimization (for latency, bandwidth and throughput)
- instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation
- graph theory
Compiler
- compiler theory
- compiler design
  - front/middle/back-end
  - optimization pass
- analysis, debug
  - profiling, instrumentation, bug isolation, data race detection, out-of-bounds access detection, performance estimation, and tracking source-level information
- Polyhedral compiler optimization
- LLVM compilerの経験もplusになるpositionが多い

Reddit post

How do I prepare for AI compiler engineer role : r/Compilers
Got a ML compiler engineer position interview at Apple : r/Compilers
Tensor compiler career : r/Compilers
- LHS dilationはこれか
  - jax.lax.conv_general_dilated
    - lhs_dilation (Sequence[int] | None | None) – None, or a sequence of n integers, giving the dilation factor to apply in each spatial dimension of lhs. LHS dilation is also known as transposed convolution.

Appendix

Job posting (as of Oct 19, 2024)

Tesla

Machine Learning Compiler Engineer, Self-Driving Hardware

**What to Expect**

As a member of the Self-Driving hardware team, you will be responsible for enabling Tesla's neural networks on our upcoming in-house custom Machine Learning Silicon. Join a small team of experienced developers in automating the compilation of PyTorch-derived neural network graphs into programs that run on next-generation Tesla's custom FSD computer. The ideal candidate is an experienced compiler engineer comfortable working rapidly in a small-team environment.

**What You’ll Do**

- Take ownership of a few areas of the compiler (flexible, based on skills/interests/needs)
- Bring up new hardware silicon and add support in the compiler for these hardware features
- Collaborate with the design team to understand current architecture and propose future improvements
- Develop algorithms to improve performance, while decreasing power

**What You’ll Bring**

- Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
- Prior industry or research experience within compiler development
- Comfortable with Python and C++
- Experience with MLIR and LLVM is a plus
- Good understanding of GPU architecture and AI accelerators
- Excellent problem solving and debugging skills

Pay range: N/A
Location: Austin, Texas
Work site: N/A

Software Engineer, ML Compiler, Dojo

**What to Expect**

As a member of the Dojo compiler team, you will be responsible for enabling Tesla's neural networks to train efficiently on our upcoming in-house custom-silicon supercomputer systems. Join a small team of experienced developers in automating the compilation of PyTorch-derived neural network graphs into programs that run on Tesla's custom FSD computer. The ideal candidate is an experienced compiler engineer comfortable working rapidly in a small-team environment.

**What You’ll Do**

- Take ownership of a few areas of the compiler (flexible, based on skills/interests/needs)
- Develop algorithms to improve performance and reduce compiler overhead
- Debug functional and performance issues on massively parallel systems
- Collaborate with Dojo HW team to understand current HW architecture and propose future improvements
- Work with Autopilot SW team to assure smooth transition of training from GPU to Dojo

**What You’ll Bring**

- Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
- Prior industry or research experience within compiler development
- Comfortable with C++ and assembly code

Pay range: $104,000 - $360,000/annual salary + cash and stock awards + benefits
Location: PALO ALTO, California
Work site: N/A

Rivian Automotive

Staff Machine Learning Compiler Engineer in Palo Alto, California | Rivian

**Role Summary**

In this position you will be a key member of the ML Compiler team working on software tools to enable inference of deep learning networks hardware on Rivian Hardware Platforms. You will work closely with the Rivian ADAS and Hardware teams and evaluate various implementation targeting for performance. You will help bring up new hardware and add support in the compiler for these hardware features.

**Responsibilities**

- Fast and accurate simulation models for Rivian Hardware Platform
- Performance analysis and optimization of ML workloads
- Work closely ADAS and software teams and facilitate migration of software models in hardware

**Qualifications**

- Degree in Computer Engineering or a related field
- Excellent C/C++ and Python programming skills
- Experience with various SOC platforms used for machine learning and ADAS
- Strong understanding of deep learning software models
- Experience in compiler pipeline development preferred
- Experience working in aggressive design environments preferred

Pay range: $206,500.00 - $258,100.00 for Northern California Based Candidates
Location: Palo Alto, California
Work site: N/A

Waymo

Machine Learning Compiler Engineer, Compute

Waymo's Compute Team is tasked with a critical and exciting mission: We deliver the compute platform responsible for running the autonomous vehicle's software stack. To achieve our mission, we architect and create high-performance custom silicon; we develop system-level compute architectures that push the boundaries of performance, power, and latency; and we collaborate with many other teammates to ensure we design and improve hardware and software for maximum performance. We are a diverse team looking for curious and talented teammates to work on one of the world's highest performance automotive compute platforms.
In this hybrid role, you will report to a Software Engineering Manager.

**You will:**

- Maximize performance of our neural networks by enhancing and extending our production grade compiler
- Work with hardware architects and model developers to develop understanding of our unique neural network inference platform and neural networks
- Implement compiler support for novel features of our high-performance architecture

**You have:**

- BS degree in Computer Science/Electrical Engineering or equivalent practical experience and 3+ years of industry experience OR
- MS degree in Computer Science/Electrical Engineering and 1+ years of industry experience OR
- PhD Degree in Computer Science/Electrical Engineering or equivalent years of experience
- 1+ years of industry and/or academic experience with compilers and parallel computing
- 1+ years of industry and/or academic experience working with ML inference or linear algebra computations
- C++ programming skills

**We prefer:**

- Python programming experience
- Experience with compilers for neural networks
- Knowledge of computer architectures used for neural network inference, and neural network performance characteristics
- Knowledge of the principles behind popular machine learning and neural network algorithms and applications

Pay range: $158,000—$200,000 USD, across US locations
Location: Mountain View, California, United States. New York City, New York, United States
Work site: N/A

Senior Machine Learning Compiler Engineer, Compute

Waymo's Compute Team is tasked with a critical and exciting mission: We deliver the compute platform responsible for running the autonomous vehicle's software stack. To achieve our mission, we architect and create high-performance custom silicon; we develop system-level compute architectures that push the boundaries of performance, power, and latency; and we collaborate with many other teammates to ensure the optimization of hardware and software for maximum performance.
In this hybrid role, you will report to a Software Engineering Manager.

**You will:**

- Analyze the performance characteristics of code generated by our production grade compiler, and design and implement optimizations to improve that performance
- Design and implement compiler support for novel features of our high-performance architecture
- Work with hardware architects to understand and influence the development of our unique neural network inference platform through hardware/software codesign
- Work with model developers to tune their neural networks for better inference efficiency and accuracy

You have:

- BS degree in Computer Science/Electrical Engineering or equivalent experience and 5+ years of industry experience OR
- MS degree in Computer Science/Electrical Engineering and 3+ years of industry experience
- PhD degree in Computer Science/Electrical Engineering and 1+ years of industry experience
- 3+ years experience working on compilers for parallel architectures
- 1+ years experience working with ML inference or linear algebra computation
- C++ programming skills

We prefer:

- Python programming experience
- Experience with compilers for neural networks
- Knowledge of computer architectures used for neural network inference, and neural network performance characteristics
- Knowledge of the principles behind popular machine learning and neural network algorithms and applications

Pay range: $192,000—$243,000 USD, across US locations
Location: New York City, New York, United States - Mountain View, California, United States
Work site: N/A

Staff Machine Learning Compiler Engineer, Compute

Waymo's Compute Team is tasked with a critical and exciting mission: We deliver the compute platform responsible for running the autonomous vehicle's software stack. To achieve our mission, we architect and create high-performance custom silicon; we develop system-level compute architectures that push the boundaries of performance, power, and latency; and we collaborate with many other teammates to ensure the optimization of hardware and software for maximum performance.
In this hybrid role, you will report to an Engineering Manager.

**You will:**

- Analyze the performance characteristics of code generated by our production grade compiler and develop and implement engineering roadmaps for its improvement
- Architect, and implement compiler support for novel features of our unique neural network inference platform
- Guide model developers and hardware architects towards improving the efficiency and achieved performance of inference hardware through software/hardware codesign

**You have:**

- BS degree in Computer Science/Electrical Engineering or equivalent experience and 7+ years of industry experience OR
- MS degree in Computer Science/Electrical Engineering and 5+ years of industry experience OR
- PhD degree in Computer Science/Electrical Engineering and 3+ years of industry experience
- 3+ years of industry and/or academic experience working on compilers for neural networks or linear algebra computation targeting parallel architectures
- 1+ years of experience in techniques used to generate code optimized for performance on a parallel architecture
- C++ programming skills

**We prefer:**

- Python programming experience
- Knowledge of computer architectures used for neural network inference, and neural network performance characteristics
- Knowledge of the principles behind popular machine learning and neural network algorithms and applications

Pay range: $226,000—$286,000 USD
Location: Mountain View, California, United States
Work site: N/A

Senior Machine Learning Engineer, Runtime & Optimization

The ML Platform team at Waymo provides a set of tools to support and automate the lifecycle of the machine learning workflow, including feature and experiment management, model development, debugging & evaluation, optimization, deployment, and monitoring. These efforts have resulted in making machine learning more accessible to teams at Waymo, including Perception, Planner, Research and Simulation, ensuring greater degrees of consistency and repeatability, and addressing the "last mile" of getting models into production and managing them once they are in place.
We are looking for a Senior engineer with model optimization expertise to help us improve compute performance on our car. In this hybrid role, you will report to our Head of Model Optimization.

**You will:**

- Work across the entire ML framework/compiler stack (e.g. JAX, XLA, Triton, and CUDA), and system-efficient deep learning models.
- Deep dive into the whole stack of ML software stack, from custom ops, framework/ ML compiler, to low-level libraries.
- Apply model optimization and efficient deep learning techniques to models and optimized ML operator libraries.

**You have:**

- M.S. in CS, EE, Deep Learning or a related field
- 3+ years of experience on model optimization or efficient deep learning techniques
- Strong Python or C++ programming skills
- Solid experience with designing, training and debugging deep learning models to achieve the highest scores/accuracies.

**We prefer:**

- PhD in CS, EE, Deep Learning or a related field.
- Deep knowledge on system performance, GPU optimization or ML compiler.
- 5+ years of experience on model optimization or efficient deep learning techniques

Pay range: $192,000—$243,000 USD
Location: Mountain View, California, United States
Work site: N/A

Gensyn

Job Application for Compiler Engineer - Distributed ML Training at Gensyn

**Responsibilities:**

- Lower deep learning graphs - from common frameworks (PyTorch, Tensorflow, Keras, etc) down to an IR representation for training - with particular focus on ensuring reproducibility
- Write novel algorithms - for transforming intermediate representations of compute graphs between different operator representations.
- Ownership - of two of the following compiler areas:
  - Front-end - deal with the handshaking of common Deep Learning Frameworks with Gensyn's IR for internal IR usage. Write Transformation passes in ONNX to alter IR for middle-end consumption
  - Middle-end - write compiler passes for training-based compute graphs, integrate reproducible Deep Learning kernels into the code generation stage, and debug compilation passes and transformations as you go
  - Back-end: lower IR from middle-end to GPU target machine code

**Minimum Requirements:**

- Compiler knowledge - base-level understanding of a traditional compiler (LLVM, GCC) and graph traversals required for writing code for such a compiler
- Solid software engineering skills - practicing software engineer, having significantly contributed to/shipped production code
- Understanding of parallel programming - specifically as it pertains to GPUs
- Strong willingness to learn Rust - as a Rust by default company, we require that everyone learns Rust so that they have context/can work across the entire codebase
- Ability to operate on:
  - High-Level IR/Clang/LLVM up to middle-end optimisation; and/or
  - Low Level IR/LLVM targets/target-specific optimisations - particularly GPU specific optimisations
  - Highly self-motivated with excellent verbal and written communication skills
  - Comfortable working in an applied research environment with extremely high autonomy

**Nice to haves:**

- Architecture understanding - full understanding of a computer architecture specialised for training NN graphs (Intel Xeon CPU, GPUs, TPUs, custom accelerators)
- Rust experience - systems level programming experience in Rust
- Open-source contributions to Compiler Stacks
- Compilation understanding - strong understanding of compilation in regards to one or more High-Performance Computer architectures (CPU, GPU, custom accelerator, or a heterogenous system of all such components)
- Proven technical foundation - in CPU and GPU architectures, numeric libraries, and modular software design
- Deep Learning understanding - both in terms of recent architecture trends + fundamentals of how training works, and experience with machine learning frameworks and their internals (e.g. PyTorch, TensorFlow, scikit-learn, etc.)
- Exposure to a Deep Learning Compiler frameworks - e.g. TVM, MLIR, TensorComprehensions, Triton, JAX
- Kernel Experience - Experience writing and optimizing highly-performant GPU kernels

Pay range: N/A
Location: N/A
Work site: Remote

AMD

Principal Machine Learning Compiler Engineer in San Jose, California

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AMD is looking for a skilled and experienced engineer to join a core team of incredibly talented industry specialists working on developing a cutting-edge machine learning model compiler targeting AMD Inference Accelerator AIE hardware devices.  The compiler needs to take a model written in PyTorch, TensorFlow, ONNX or JAX and produce optimized control and executable code for AIE VLIW processor array. Additionally, the compiler needs to handle partitioning between x86 and hardware accelerator, generate optimal code for AMD x86 CPU using AMD ZenDNN, and generate code to interface with AMD AIE specific runtime and driver.

**THE PERSON:**

The ideal candidate should be passionate about implementing and improving effective algorithms and techniques to create and enhance compiler and runtime for machine learning and possess leadership skills to drive sophisticated issues to resolution. The candidate should be able to communicate effectively and work optimally with different teams across AMD.

**KEY RESPONSIBILITIES:**

- Implement and improve passes in the compiler
- Integrate compiler and compiled model with ML Frameworks (such as PyTorch and Tensorflow)
- Implement model partitioning in ML Frameworks and/or MLIR
- Implement runtime to distribute work to and collect results from x86 cores and the array of AIE cores
- Mentor and provide guidance to others
- Learn latest industry trends and bring new ideas to the team
- Design and develop new groundbreaking AMD technologies
- Debugging/fix existing issues and research alternative, more efficient ways to accomplish the same work
- Develop technical relationships with peers and partners

**PREFERRED EXPERIENCE:**

- Strong object-oriented programming background, C/C++ and Python
- ML Compiler and Runtime technologies: OneDNN, MLIR, XLA, OpenXLA, IREE, OpenAI Triton compiler
- Compiler building skills
- Code generation for a ML hardware accelerator
- GPU code generation
- Machine Learning concepts and model development experience
- Understanding of PyTorch, Tensorflow, ONNX, JAX etc.
- Ability to write high quality code with a keen attention to detail
- Experience with modern concurrent programming and threading APIs
- Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
- Effective communication and problem-solving skills
- Motivating leader with good interpersonal skills

**PREFERRED ACADEMIC CREDENTIALS:**

- Bachelor’s, Master's or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Pay range: USD $226,400.00/Yr up to $339,600.00/Yr
Location: San Jose, California
Work site: N/A

Deep Learning Compiler Engineer for Ryzen AI NPU in San Jose, California

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

We are looking for a talented Machine Learning (ML) Compiler SW Engineer to join our growing team in the AI group and play a crucial role in developing SW toolset to deploy cutting-edge ML models on AMD’s XDNA Neural Processing Units (NPU). You will be responsible for designing, implementing, and optimizing compilers, that translate Gen-AI ML inference models like SDXL-Turbo, Llama2, Mistral, etc. into low-level code for specialized hardware architectures. Your work will directly impact the efficiency, scalability, and reliability of our ML applications.

**THE PERSON:**

If you thrive in a fast-paced environment and love working on cutting edge machine learning inference, this role is for you.

**RESPONSIBILITIES:**

- Design and develop novel algorithms for tiling and mapping quantized ML workloads on Ryzen AI NPU.
- Analyze and transform intermediate representations of ML models (computational graphs) for efficient execution.
- Collaborate with architects and runtime software engineers to understand performance requirements of different operators and translate them into effective compiler strategies.
- Collaborate with kernel developers to understand kernel tiling requirements and strategize the dataflow and L1/L2 buffer allocation schemes.
- Develop back-end optimization passes to convert high-level representation into driver calls.
- Implement compiler optimizations for performance, resource usage, and compute efficiency.
- Develop and maintain unit tests and integration tests for the compiler to support different generations of NPU architecture.
- Enable detailed profiling and debugging tools for analyzing performance bottlenecks and deadlocks in dataflow schemes.
- Stay up-to-date on the latest advancements in ML compiler technology and hardware architectures.

**PREFERRED EXPERIENCES:**

- Strong understanding of the dataflow scheduling and memory hierarchy in a multi-core processor architecture.
- Knowledge of compiler design principles (front-end, middle-end, back-end).
- Experience with machine learning frameworks (e.g., TensorFlow, PyTorch).
- Experience working with ML compilers (e.g., MLIR, TVM).
- Experience with ML models such as CNN, LSTM, LLMs, Diffusion is a must.
- Excellent programming skills in Python, C++, or similar languages.
- Experience with machine learning hardware architectures (e.g., GPUs, TPUs, VLIW) is a plus.
- A passion for innovation and a strong desire to push the boundaries of machine learning performance.

**ACADEMIC CREDENTIALS:**

- Master's degree or PhD. in Computer Science, Engineering, or a related field (or Bachelor's degree with significant experience).

Pay range: USD $182,800.00/Yr up to $274,200.00/Yr
Location: San Jose, California
Work site: N/A

Senior Deep Learning Compiler Engineer (GPU) in San Jose, California

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**The Role**

IREE is an open source, MLIR based, compilation stack that supports compilation of ML models on multiple target architectures. For many of these architectures, like x86, ARM, RISC-V, as well as some APUs, LLVM compilation is the last layer of the compilation stack. In this role, the candidate will be expected to enhance the LLVM compilation on current and future AMD GPUs. The role is expected to be central to be able to achieve good performance on various LLVM based GPU backends using IREE and will have a direct impact on effective deployment of ML models on AMD devices.

**The person**

This role is ideal for someone who has experience with LLVM; knows/is interested in learning the best way to achieve good performance on given architecture. This person must be able to understand the current MLIR/LLVM based compilation flow, to effectively identify opportunities for optimization at various levels of the stack. They must be able to design and implement these optimizations either in LLVM or in MLIR, to optimize the binary generated by the compiler. The person must also enjoy working in open-source projects like MLIR, LLVM and IREE and be able to engage with the community effectively. This role is ideal for someone who might be new to MLIR but is interested in contributing to it.

**Key responsibilities**

- Support and contribute to AMD GPU backend compilation in LLVM.
- Understand current and upcoming architecture features on AMD GPUs and help design the compiler strategy to target these features effectively within IREE.
- Plan for and design compiler transformations in MLIR or LLVM that are needed to generate efficient code.
- Contribute to and engage with open-source communities in LLVM, MLIR and IREE.
- Maintain high level of code quality and testing.

**Preferred experience**

- Bachelor’s, Master’s or PhD in computer science or related field.
- Multiple years of experience working with an LLVM based compiler, MLIR experience optional
- Known history of contribution to open-source projects is preferred
- Prior experience in ML compilers is optional but preferred.
- Experience with fuzzers and reducers is a plus.

Pay range: USD $192,000.00/Yr up to $288,000.00/Yr
Location: San Jose, California
Work site: N/A

Deep Learning - Compiler Engineer in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

We are looking for a dynamic, energetic Lead Compiler Engineer to join our growing team in AI group. As a part of this role, you will be responsible for designing, developing, and optimizing frontend compiler for latest neural networks on AMD’s XDNA Neural Processing Units that power cutting edge generative AI models like Stable diffusion, SDXL-Turbo, Llama2, etc. Your work will directly impact the efficiency, scalability, and reliability of our ML applications. If you thrive in a fast-paced environment and love working on cutting edge machine learning inference applications, this role is for you.  

**THE PERSON:**

This AMD (Advanced Micro Devices) team is looking for a senior level person that can help guide the team, mentor upcoming developers, provide long range strategy, and is willing to jump in to help resolve issues quickly.  You will be involved in all areas that impact the team including performance, automation, and development.  The right candidate will be informed on the latest trends and become prepared to give consultative direction to senior management.

**KEY RESPONSIBILITIES: **

- Design and implement NPU compiler framework for neural networks.
- Develop hardware aware graph optimizations for high level ML frameworks like ONNX.
- Research new algorithms for operator scheduling for efficient inference of latest NN models.
- Interface with ONNX / Pytorch runtime and lower level HW implementation.
- Contribute to high performance inference for GenAI workloads such as Llama2-7B, Stable diffusion, SDXL-Turbo etc.
- Work closely with kernel developers, performance architects, and AI researchers
- Manage CPU, and memory resources effectively during model execution.
- Handle resource allocation for ML deployments across different tenants.
- Research heterogenous mapping of ML operators for maximum efficiency.
- Build tools to track resource utilization, bottlenecks, and anomalies.
- Enable detailed profiling and debugging tools for analyzing ML workload latency.
- Implement rigorous code review practices for superior code quality assurance.

**PREFERRED EXPERIENCE:**

- Strong programming skills in C++, Python.
- Experience with proprietary/open source compiler stack: TVM, MLIR.
- Experience with ML frameworks (e.g., ONNX, PyTorch) is required.
- Experience with ML models such as CNN, LSTM, LLMs, Diffusion is a must.
- Experience with ONNX, Pytorch runtime integration is a bonus.
- Excellent problem-solving abilities and a passion for performance optimization.

**ACADEMIC CREDENTIALS:**

- Master’s, or PhD degree in Computer Science, Electrical Engineering, or related fields.

Pay range: USD $90,240.00/Yr up to $135,360.00/Yr
Location: Austin, Texas
Work site: N/A

Senior AI GPU Compiler Engineer in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

In this exciting opportunity, you will be immersed in the world of open-source AI compiler and runtime stack. Your responsibilities will encompass gaining a comprehensive understanding of compiling Machine Learning networks specifically tailored for AMD graphics cards. This hands-on experience will not only involve delving into the fundamentals of GPU hardware capabilities but also actively engaging with MLIR dialects, execution models, and code generation. As part of your role, you will have the chance to contribute to the development of various components within a sophisticated software tool chain, all crafted in C++. Get ready to explore and contribute to the cutting edge of AI compilation technology.

**THE PERSON:**

The ideal candidate is someone who embodies a genuine passion for software engineering and demonstrates strong leadership skills, capable of steering complex issues to successful resolutions. They should excel in effective communication and be adept at collaborating seamlessly with diverse teams across AMD.

**KEY RESPONSIBILITIES:**

- Implement and optimize GPU kernels using cutting-edge compiler technology
- Conduct benchmarking on prevalent AI model workloads, including tasks in vision, speech recognition, language processing and generative models
- Collaborate with AMD’s architecture specialists to enhance future products
- Stay abreast of software and hardware trends and innovations, particularly in algorithms and architecture, ensuring the integration of the latest advancements into our projects
- Participating in new GPU bring ups for AI software
- Debug and resolve existing issues and research alternative, more efficient ways to accomplish the same work
- Cultivate technical relationships with peers and partners, fostering collaborative efforts and knowledge exchange

**KEY QUALIFICATIONS:**

- Strong object-oriented programming background, C/C++ preferred
- Proficient in crafting high performance GPU kernels
- Prior experience with compilers, dataflow graphs
- Familiarity with modern source control systems such as GitHub
- Excellent verbal communication and writing skills

**WHAT MAKES YOU STAND OUT:**

- Proven expertise in optimizing high performance GPU kernels
- Prior hands-on experience using AI accelerators
- Contributions in LLVM or MLIR projects
- Previous exposure to machine learning models
- A collaborative team player with experience working in geographically dispersed teams

**ACADEMIC CREDENTIALS:**

- Advanced degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Pay range: USD $144,000.00/Yr up to $216,000.00/Yr
Location: Austin, Texas
Work site: N/A

Senior Triton Compiler Engineer in San Jose, California

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

Triton (https://github.com/triton-lang/triton) is a language and compiler for writing highly efficient custom Deep-Learning primitives. AMD GPU is a supported backend in Triton and we are fully committed to it. If you are interested in making GPUs running fast via developing the Triton compiler and kernels, please come join us!

**THE PERSON:**

An ideal candidate should be familiar with compilers, GPU architectures, parallel programming, and/or high-performance kernels. He/she should be comfortable at performing quantitative analysis of workload and drive improvements across different compiler layers. Most importantly, the candidate is willing to learn and work across boundaries.

**KEY RESPONSIBILITIES:**

- Develop and maintain Triton compiler's AMD backend
- Improve various compilation patterns and passes in Triton
- Research and author high performance matmul and attention kernels in Triton
- Profile kernel performance on AMD GPUs and improve bottlenecks
- Fix issues in AMDGPU backend in LLVM

**PREFERRED EXPERIENCE:**

- Familiarity or existing experience with Triton is a strong plus
- Familiarity with MLIR, LLVM, IREE
- Deep understanding of GPU architectures and programming models
- Deep experience with writing high performance GPU kernels and GPU performance tuning
- Experience debugging cross-stack issues and reducing user problems to actionable execution.
- Open-source development ethos

**PREFERRED ACADEMIC CREDENTIALS:**

- Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Pay range: USD $192,000.00/Yr up to $288,000.00/Yr
Location: San Jose, California
Work site: N/A

ML Frameworks Software Development Engineer - vLLM in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

We are looking for an AI Software development engineer who is passionate about building and optimizing deep learning applications and machine learning frameworks.
Be a part of an AMD development team and the open-source community to design, develop, test and deploy improvements to make us the best platform for generative artificial intelligence and deep neural networks. Incredibly talented industry specialists.

**THE PERSON:**

The ideal candidate has strong technical and analytical skills in high performance C++. Able to communicate effectively and work optimally with different teams across AMD. Experience in tensor libraries or machine learning execution engines is a plus.

**KEY RESPONSIBILITIES:**

- Optimize key deep learning models on AMD GPUs
- Develop and finetune inference and serving engines for large language models
- Collaborate with AMD’s architecture specialists to improve future products
- Collaborate with operator libraries team to analyze and optimize training and inference for deep learning
- Work in a distributed computing setting to optimize for both scale-up (multi-accelerator) and scale-out (multi-node) systems
- Optimize the entire deep learning pipeline including graph compiler integration
- Participate in new ASIC and hardware bring ups
- Apply your knowledge of software engineering best practices

**PREFERRED EXPERIENCE:**

- Ability to work independently, define project goals and scope, and lead your own development effort
- Deep Learning experience or knowledge - Natural Language Processing, Vision, Audio, Recommendation systems
- Programming in High-Performance Domain-Specific Languages for GPU or ML ASIC Computing (HIP, CUDA, OpenCL, Triton, KNYFE, Mojo)
- Using debuggers (LLDB, GDB) and profilers on heterogeneous hardware
- PyTorch or TensorFlow
- Familiarity with ML Graph Compilers (Glow, FX, XLA) is a plus
- Knowledge of the lower levels of typical ML workflows from MLIR to LLVM all the way down to assembly language(s) is a plus

**ACADEMIC CREDENTIALS:**

- Master’s or PhD in Computer Science, Computer Engineering or related fields

Pay range: USD $138,400.00/Yr up to $207,600.00/Yr
Location: Austin, Texas
Work site: N/A

ML compiler development Engineer in Hyderabad, India

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**The Person**

We are seeking a highly skilled and experienced professional with a strong background in C++, Python, and machine learning frameworks such as PyTorch and ONNX. The ideal candidate will also be proficient in open-source code version control, particularly Git.

**Key Responsibilities**

- Enable support for various operations from different ML frameworks to LLVM.
- Develop  model to validate functionality and performance.
- Monitor and maintain end-to-end ML model for the compiler in production, addressing issues as they arise.
- Contribute to open-source projects, sharing your developments with the community.
- Influence the direction of the AMD AI platform.

**Preferred Experience**

- A minimum of 5 years of experience in relevant fields.
- Proficiency with AI/DL frameworks such as PyTorch, ONNX, or TensorFlow.
- Exceptional programming skills in Python and C++, including debugging, profiling, and performance analysis.
- Experience with machine learning pipelines and CI/CD pipelines.
- Knowledge of MLIR is a significant advantage.
- Strong communication and problem-solving abilities.

**Academic Credentials**

- A master’s degree in computer science, Artificial Intelligence, Machine Learning, or a related field.

Pay range: N/A
Location: Hyderabad, India
Work site: N/A

Deep Learning Library GPU Software Development Architect - Performance AI Libraries in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**The ROLE:**

AMD is looking for an individual to join a hardworking team developing Deep Learning and High-Performance Computing GPU kernels on the AMD Radeon Open Compute (ROCm) platform (https://github.com/ROCmSoftwarePlatform) and MIOpen (https://github.com/ROCmSoftwarePlatform/MIOpen), AMD's Deep Learning primitives library which provides highly-optimized implementations of a variety of operators.

**The PERSON:**

The successful person will be an experienced programmer with experience in performance optimization on convolution and other critical AI performance operators on GPU and other massive parallel processing environment.

**KEY RESPONSIBILITIES:**

- The ideal candidate will be responsible for architecting and supporting for AMD’s Machine Learning and Deep Learning Library: MIOpen (https://github.com/ROCmSoftwarePlatform/MIOpen)
- The candidate should have experience with deep learning training and inference workloads optimization on massive parallel hardware and software platforms.
- They will be responsible guiding performance optimizing algorithms for new GPU hardware.
- Perform code reviews, build unit tests, author detailed documentation related to their work, and work with on-site and offshore teams to deliver the software solutions on schedule.
- They will play a key role in all phases of the software development including system requirements analysis, coordinating feature design and development across functional and organizational boundaries.
- They will be driving new hardware and software co-design to optimize state-of-the-art deep learning workloads, and leading the collaboration from hardware, compiler, to a variaty of deep learning frameworks

**PREFERED EXPERIENCE:**

- Strong programming skills in software architecture in C/C++, and background in machine learning and deep learning, particularly in training.
- Experience with analyzing and tuning performance optimization for GPU computing is preferred.
- Experience or knowledge about deep learning primitives, fusion, and inference optimization is preferred.
- Strong knowledge of software development lifecycle, SW practices including debugging, test, revision control, documentation, and bug tracking.
- Good teamwork and interpersonal skills required.
- Ability to work independently and within complementary teams.
- Demonstrate flexibility, strong motivation, and a proven track record of meeting results-oriented deadlines.
- Knowledge with deep neural network machine learning technologies and modern machine learning programming frameworks.
- Experience working with and developing virtualization containers and package managers for code deployment.

**ACADEMIC CREDENTIALS:**

- PhD or Master’s in Computer Science, Computer Engineering, or related subjects, or strong and equivalent experience

Pay range: USD $193,600.00/Yr up to $290,400.00/Yr
Location: Austin, Texas
Work site: N/A

AI Framework Software Development Engineer in Cambridge, United Kingdom

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AI Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms. 
Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications.

**THE PERSON:**

Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort.

**KEY RESPONSIBILITIES:**

- Optimize deep learning frameworks like TensorFlow, PyTorch, etc. on AMD GPUs in upstream open-source repositories
- Develop and optimize key Deep Learning models on AMD GPUs
- Collaborate and interact with internal GPU library teams to analyze and optimize training and inference for deep learning
- Work with open-source framework maintainers to understand their requirements – and have your code changes integrated upstream
- Work in a distributed computing setting to optimize for both scale-up (multi-GPU) and scale-out (multi-node) systems
- Work with cutting-edge compiler technologies
- Optimize the entire deep learning pipeline including graph compiler integration
- Apply your knowledge of software engineering best practices

**PREFERRED EXPERIENCE:**

- Ability to work independently, define project goals and scope, and lead your own development effort.
- Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
- Experiences to run workloads on large scale heterogeneous cluster is a plus
- Knowledge of compiler is a plus
- Knowledge of GPU computing (HIP, CUDA, OpenCL) and basic understanding of Deep Learning is a plus

**ACADEMIC CREDENTIALS:**

- Masters or PhD or equivalent experience in Computer Science, Computer Engineering, or related field.

Pay range: N/A
Location: Cambridge, United Kingdom
Work site: N/A

SMTS Software Development Eng. in Shanghai, China

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AI Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms. 
Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications.

**THE PERSON:**

Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort.

**KEY RESPONSIBILITIES:**

- Optimize deep learning frameworks like TensorFlow, PyTorch, etc. on AMD GPUs in upstream open-source repositories
- Develop and optimize key Deep Learning models on AMD GPUs
- Collaborate and interact with internal GPU library teams to analyze and optimize training and inference for deep learning
- Work with open-source framework maintainers to understand their requirements – and have your code changes integrated upstream
- Work in a distributed computing setting to optimize for both scale-up (multi-GPU) and scale-out (multi-node) systems
- Work with cutting-edge compiler technologies
- Optimize the entire deep learning pipeline including graph compiler integration
- Apply your knowledge of software engineering best practices

**PREFERRED EXPERIENCE:**

- Ability to work independently, define project goals and scope, and lead your own development effort.
- Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
- Experiences to run workloads on large scale heterogeneous cluster is a plus
- Knowledge of compiler is a plus
- Knowledge of GPU computing (HIP, CUDA, OpenCL) and basic understanding of Deep Learning is a plus

**ACADEMIC CREDENTIALS:**

- Masters or PhD or equivalent experience in Computer Science, Computer Engineering, or related field.

Pay range: N/A
Location: Shanghai, China
Work site: N/A

Deep Learning Library GPU Software Development Engineer - Performance Kernel and AI Libraries in Santa Clara, California

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AMD is looking for an individual to join a hardworking team developing Deep Learning and High-Performance Computing GPU kernels on the AMD Radeon Open Compute (ROCm) platform (https://github.com/ROCm) and MIOpen (https://github.com/ROCm/MIOpen), AMD's Deep Learning primitives library which provides highly-optimized implementations of a variaty of operators, and collaborating with AI kernel development in Composable Kernel (ROCm/composable_kernel: Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators (github.com))

**THE PERSON:**

The successful person will be an experienced programmer with experience in performance optimization on convolution and other critical AI performance operators on GPU and other massive parallel processing environment.

**KEY RESPONSIBILITIES:**

- The ideal candidate will be responsible for architecting and supporting for AMD’s Machine Learning and Deep Learning Library: MIOpen (https://github.com/ROCm/MIOpen)
- The candidate should have experince with deep learning training and inference workloads optimization on massive parallel hardware and software platforms.
- They will be responsible guiding performance optimizing algorithms for new GPU hardware.
- Perform code reviews, build unit tests, author detailed documentation related to their work, and work with on-site and offshore teams to deliver the software solutions on schedule.
- They will play a key role in all phases of the software development including system requirements analysis, coordinating feature design and development across functional and organizational boundaries.
- They will be driving new hardware and software co-design to optimize state-of-the-art deep learning workloads, and leading the collaboration from hardware, compiler, to a variaty of deep learning frameworks

**PREFERRED EXPERIENCE:**

- Strong programming skills in software architecture in C/C++, and background in machine learning and deep learning, particularly in training.
- Experience with analyzing and tuning performance optimization for GPU computing is preferred.
- Experience or knowledge about deep learning primitives, fusion, and inference optimization is preferred.
- Strong knowledge of software development lifecycle, SW practices including debugging, test, revision control, documentation, and bug tracking.
- Good teamwork and interpersonal skills required.
- Ability to work independently and within complementary teams.
- Demonstrate flexibility, strong motivation, and a proven track record of meeting results-oriented deadlines.
- Knowledge with deep neural network machine learning technologies and modern machine learning programming frameworks.
- Experience working with and developing virtualization containers and package managers for code deployment.

**ACADEMIC CREDENTIALS:**

- PhD or Master’s in Computer Science, Computer Engineering, or related subjects, or strong and equivalent experience

Pay range: USD $163,360.00/Yr up to $245,040.00/Yr
Location: Santa Clara, California
Work site: N/A

Senior Software Engineer - Triton Kernels in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AMD is looking for an AI software development engineer to develop ML kernels in the Triton kernel language. We are looking for an engineer who is passionate about optimizing Machine Learning GPU kernels and improving the performance of key applications and benchmarks. What you do directly impacts the performance of AMD GPUs and enables us to become a competitive solution for generative AI. Become a part of our high-impact and incredibly talented Triton kernels team.

**THE PERSON:**

The ideal candidate should have experience in parallel computer architecture and high performance GPU kernel development. Additional experience developing ML models and knowledge of ML frameworks like PyTorch or JAX is a plus.

**KEY RESPONSIBILITIES:**

- Develop ML kernels for matrix multiplication, Flash Attention and other ML operators
- Benchmark, perform competitive analysis and optimize your kernels to improve performance
- Propose novel optimizations to the Triton compiler
- Collaborate with the GPU architecture team to improve future generations
- Apply knowledge of software engineering best practices

**PREFERRED EXPERIENCE:**

- Programming experience on GPUs - HIP, CUDA, OpenCL or Triton
- ML experience or knowledge in one or more of the following areas - Transformers, image models, recommendation systems
- Ability to work independently, define project goals and scope, and lead your own development effort
- Experience using debuggers
- Familiarity with PyTorch or JAX
- Knowledge of MLIR, LLVM and GPU assembly and GPU architecture is a plus
- Familiarity with models like LLama, Mixtral and Gemma is a plus

**ACADEMIC CREDENTIALS:**

- Master's degree or PhD in Computer Science or Computer Engineering

Pay range: USD $160,800.00/Yr up to $241,200.00/Yr
Location: Austin, Texas
Work site: N/A

Deep Learning Compiler SW Engineer for Ryzen AI NPU in San Jose, California

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

We are looking for a talented Machine Learning (ML) Compiler SW Engineer to join our growing team in the AI group and play a crucial role in developing SW toolset to deploy cutting-edge ML models on AMD’s Neural Processing Units (NPU). You will be responsible for designing, and implementing compiler optimization passes that translate Gen-AI ML inference models like SDXL-Turbo, Llama2, Mistral, etc. into low-level code for compute and dataflow on NPUs. Your work will directly impact the efficiency, scalability, and performance of our ML applications.

**THE PERSON:**

If you thrive in a fast-paced environment and love working on cutting edge machine learning inference, this role is for you.

**RESPONSIBILITIES:**

- Design and develop novel algorithms for tiling and mapping quantized ML workloads on application specific hardware platforms.
- Analyze and transform intermediate representations of ML models (computational graphs) for efficient execution.
- Collaborate with Architecture and runtime software teams to develop optimization strategies for the compiler.
- Implement loop tiling, and buffer allocation strategies for efficient compute and DRAM access.
- Develop back-end optimization passes to convert high-level representation into driver calls for different NPU generations.
- Testing and validation of optimized kernels and dataflow graphs using performance models.
- Develop and maintain unit tests and integration tests for the compiler to support different generations of HW architectures.
- Enable detailed profiling and debugging tools for analyzing performance bottlenecks and deadlocks in the dataflow schemes.
- Stay up-to-date on the latest advancements in ML compiler technology and hardware architectures.

**PREFERRED EXPERIENCES:**

- Strong understanding of ML compiler optimizations (front-end, middle-end, back-end).
- Experience with machine learning frameworks (e.g., TensorFlow, PyTorch).
- Experience working with on GPUs, TPUs, NPUs, or vector processors is a must.
- Experience with ML models such as CNN, LSTM, LLMs.
- Excellent programming skills in Python, C++, or similar languages.
- A passion for innovation and a strong desire to push the boundaries of machine learning performance.

**ACADEMIC CREDENTIALS:**

- Master's degree or PhD. in Computer Science, Engineering, or a related field (or Bachelor's degree with significant experience).

Pay range: USD $182,800.00/Yr up to $274,200.00/Yr
Location: San Jose, California
Work site: N/A

ML Frameworks Software Development Engineer - vLLM in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

We are looking for an AI Software development engineer who is passionate about building and optimizing deep learning applications and machine learning frameworks.
Be a part of an AMD development team and the open-source community to design, develop, test and deploy improvements to make us the best platform for generative artificial intelligence and deep neural networks. Incredibly talented industry specialists.

**THE PERSON:**

The ideal candidate has strong technical and analytical skills in high performance C++. Able to communicate effectively and work optimally with different teams across AMD. Experience in tensor libraries or machine learning execution engines is a plus.

**KEY RESPONSIBILITIES:**

- Optimize key deep learning models on AMD GPUs
- Develop and finetune inference and serving engines for large language models
- Collaborate with AMD’s architecture specialists to improve future products
- Collaborate with operator libraries team to analyze and optimize training and inference for deep learning
- Work in a distributed computing setting to optimize for both scale-up (multi-accelerator) and scale-out (multi-node) systems
- Optimize the entire deep learning pipeline including graph compiler integration
- Participate in new ASIC and hardware bring ups
- Apply your knowledge of software engineering best practices

**PREFERRED EXPERIENCE:**

- Ability to work independently, define project goals and scope, and lead your own development effort
- Deep Learning experience or knowledge - Natural Language Processing, Vision, Audio, Recommendation systems
- Programming in High-Performance Domain-Specific Languages for GPU or ML ASIC Computing (HIP, CUDA, OpenCL, Triton, KNYFE, Mojo)
- Using debuggers (LLDB, GDB) and profilers on heterogeneous hardware
- PyTorch or TensorFlow
- Familiarity with ML Graph Compilers (Glow, FX, XLA) is a plus
- Knowledge of the lower levels of typical ML workflows from MLIR to LLVM all the way down to assembly language(s) is a plus

**ACADEMIC CREDENTIALS:**

- Master’s or PhD in Computer Science, Computer Engineering or related fields

Pay range: USD $160,800.00/Yr up to $241,200.00/Yr
Location: Austin, Texas
Work site: N/A

Deep Learning Library GPU Software Development Engineer - Kernel Development in Austin, Texas

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AMD is looking for an influential software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. 

**THE PERSON:**

The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

**KEY RESPONSIBILITIES:**

- Work with AMD’s architecture specialists to improve future products.
- The ideal candidate will be responsible for writing high-performance GPU kernels for AMD’s Machine Learning and Deep Learning Library: MIOpen (https://github.com/ROCmSoftwarePlatform/MIOpen) and Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
- They will be porting and optimizing algorithms for new GPU hardware Perform code reviews, build unit tests, author detailed documentation related to their work, and work with on-site and offshore teams to deliver the software solutions on schedule
- They will play a key role in all phases of the software development including system requirements analysis, coordinating feature design and development across functional and organizational boundaries
- Stay informed of software and hardware trends and innovations, especially pertaining to algorithms and architecture
- Design and develop new groundbreaking AMD technologies
- Participating in new ASIC and hardware bring ups
- Debugging/fix existing issues and research alternative, more efficient ways to accomplish the same work
- Develop technical relationships with peers and partners

**PREFERRED EXPERIENCE:**

- Strong programming skills in C/C++; experience with CUDA programming and CUTLASS preferred Experience with LLVM Compiler, and compiler optimization techniques for GPU computing is preferred.
- Experience or knowledge about BLAS operators and GEMM optimization.
- Knowledge of Computer Architect and GPU architect.
- Good teamwork and interpersonal skills required Ability to work independently and within complementary teams
- Demonstrate flexibility, strong motivation, and a proven track record of meeting results-oriented deadlines
- Knowledge with deep neural network machine learning technologies and modern machine learning
- programming frameworks Experience working with and developing virtualization containers and package managers for code deployment
- Ability to write high quality code with a keen attention to detail
- Experience with modern concurrent programming and threading APIs
- Experience with Windows, Linux and/or Android operating system development 
- Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
- Effective communication and problem-solving skills
- Motivating leader with good interpersonal skills

**ACADEMIC CREDENTIALS:**

- Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent with experience in GPU programming
- PhD in Computer Science and related programs with experience in Parallel Computing and GPU Architect

Pay range: USD $96,800.00/Yr up to $145,200.00/Yr
Location: Austin, Texas
Work site: N/A

Deep Learning Library GPU Software Development Engineer in CALGARY, Canada

**Job Description**

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

**THE ROLE:**

AMD is looking for an influential software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. 

**THE PERSON:**

The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD.

**KEY RESPONSIBILITIES:**

- Work with AMD’s architecture specialists to improve future products.
- The ideal candidate will be responsible for writing high-performance GPU kernels for AMD’s Machine Learning and Deep Learning Library: MIOpen (https://github.com/ROCmSoftwarePlatform/MIOpen) and Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
- They will be porting and optimizing algorithms for new GPU hardware Perform code reviews, build unit tests, author detailed documentation related to their work, and work with on-site and offshore teams to deliver the software solutions on schedule
- They will play a key role in all phases of the software development including system requirements analysis, coordinating feature design and development across functional and organizational boundaries
- Stay informed of software and hardware trends and innovations, especially pertaining to algorithms and architecture
- Design and develop new groundbreaking AMD technologies
- Participating in new ASIC and hardware bring ups
- Debugging/fix existing issues and research alternative, more efficient ways to accomplish the same work
- Develop technical relationships with peers and partners

**PREFERRED EXPERIENCE:**

- Strong programming skills in C/C++; experience with CUDA programming and CUTLASS preferred Experience with LLVM Compiler, and compiler optimization techniques for GPU computing is preferred.
- Experience or knowledge about BLAS operators and GEMM optimization.
- Knowledge of Computer Architect and GPU architect.
- Good teamwork and interpersonal skills required Ability to work independently and within complementary teams
- Demonstrate flexibility, strong motivation, and a proven track record of meeting results-oriented deadlines
- Knowledge with deep neural network machine learning technologies and modern machine learning programming frameworks Experience working with and developing virtualization containers and package managers for code deployment
- Ability to write high quality code with a keen attention to detail
- Experience with modern concurrent programming and threading APIs
- Experience with Windows, Linux and/or Android operating system development 
- Experience with software development processes and tools such as debuggers, source code control systems (GitHub) and profilers is a plus
- Effective communication and problem-solving skills
- Motivating leader with good interpersonal skills

**ACADEMIC CREDENTIALS:**

- Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent with experience in GPU programming
- PhD in Computer Science and related programs with experience in Parallel Computing and GPU Architect

Pay range: CAD $116,160.00/Yr up to $174,240.00/Yr
Location: CALGARY, Canada
Work site: N/A

Ampere Computing

AI Compiler Engineering Lead - Distinguished Engineer Job in Portland, OR

**About the role:**

Ampere is looking for an enthusiastic and highly-skilled AI Compiler Engineering Lead to join our expanding AI/LLM Compiler Team. In this role, you will spearhead the ongoing design and development of advanced compilers for Ampere's upcoming AmpereOne Aurora products. Your responsibilities will include managing the compiler architecture, making substantial contributions to its backend development, and expanding the team. You will work closely with Ampere’s hardware engineering design teams to develop a “best in class” algorithm-compiler-hardware ecosystem.

**What you’ll achieve:**

- Successfully develop and deliver a functional AI compiler (either ahead-of-time or just-in-time) along with an associated runtime environment, ensuring robust performance and reliability.
- Lead the Ampere Computing AI compiler software efforts, driving innovation and excellence in Ampere’s compiler technology.
- Work hand-in-hand with the silicon design team to enhance the Ampere Aurora AI accelerator, contributing to cutting-edge hardware-software integration.
- Design and implement advanced solutions and enhancements for prominent machine learning libraries, including Llama.cpp, PyTorch, JAX, vLLM, and ONNX (C++), significantly boosting their performance and capabilities.
- Conduct in-depth analyses of neural network models, identifying and proposing optimizations at multiple levels including model architecture, framework efficiency, compilation process, and execution speed, leading to substantial performance improvements.

**About you:**

- Master's or PhD degree in Computer Science, Electrical Engineering, Mathematics, or a similar quantitative field and 15 years of overall software engineering experience.
- Strong Proficiency in Python and C/C++ Languages: Bring your expertise and passion for programming with at least 4 years of hands-on experience in Python and C/C++, driving innovative solutions through powerful code.
- Experience with Large-Scale Software Systems: Showcase your background working on substantial software projects, especially in the realm of compilers or domain-specific compilers, contributing to robust, high-performance systems.
- Knowledge of Essential Tools: Demonstrate your command of vital development tools such as Linux, Git, GCC, and LLVM, ensuring smooth, efficient workflows and high-quality results.
- Adaptability in a Startup Environment: Thrive in our fast-paced, dynamic startup culture where your ability to take initiative and achieve impactful results is highly valued and rewarded.
- Familiarity with Machine Learning and Deep Learning: Leverage your expert knowledge of machine learning and deep learning, with hands-on experience in popular frameworks like TensorFlow, PyTorch, Llama.cpp, and vLLM, to drive groundbreaking advancements in AI technologies.

Pay range: $195,400 and $325,600, except in the San Francisco Bay Area where the range is between $201,400 and $335,600
Location: Portland, OR
Work site: N/A

dMatrix

ML Compiler Engineer, Staff

**What you will do:**

The d-Matrix compiler team is looking for exceptional candidates to help develop the compiler backend - specifically the problem of assigning hardware resources in a spatial architecture to execute low level instructions. The successful candidate will be motivated, capable of solving algorithmic compiler problems and interested in learning intricate details of the underlining hardware and software architectures. The successful candidate will join a team of experienced compiler developers, which will be guiding the candidate for a quick ramp up in the compiler infrastructure, in order to attack the important problem of mapping low level instructions to hardware resources. We have opportunities specifically in the following areas:
- Model partitioning (pipelined, tensor, model and data parallelism), tiling, resource allocation, memory management, scheduling and optimization (for latency, bandwidth and throughput).

**What you will bring:**

**Minimum:**

- Bachelor's degree in Computer Science with 7+ Yrs of relevant industry experience, MSCS Preferred with 5+ yrs of relevant industry experience.
- Ability to deliver production quality code in modern C++.
- Experience in modern compiler infrastructures, for example: LLVM, MLIR.
- Experience in machine learning frameworks and interfaces, for example: ONNX, TensorFlow and PyTorch.
- Experience in production compiler development.

**Preferred:**

- Algorithm design ability, from high level conceptual design to actual implementation.
- Experience with relevant Open Source ML projects like Torch-MLIR, ONNX-MLIR, Caffe, TVM.
- Passionate about thriving in a fast-paced and dynamic startup culture.

Pay range: N/A
Location: Hybrid, working onsite at our Toronto, Ontario headquarters 3 days per week.
Work site: Hybrid

Groq

Sr. Compiler Engineer

**Position Summary:**

Groq is a machine learning systems company building easy-to-use solutions for accelerating artificial intelligence workloads.  Our work spans hardware, software, and machine learning technology. As Sr. Compiler Engineer, you will be responsible for defining and developing compiler optimizations for our state-of-the-art spatial compiler - targeting Groq's revolutionary Tensor Streaming Processor.  You will be the technical lead for Groq's TSP compiler, and be in charge of architecting new passes, developing innovative scheduling techniques, and developing new front-end language dialects to support the rapidly evolving ML space.  You will also be required to benchmark and monitor key performance metrics to ensure that the compiler is producing efficient mappings of neural network graphs to the Groq TSP.  Experience with LLVM and MLIR preferred, and knowledge with functional programming languages an asset. Also, knowledge with ML frameworks such as TensorFlow and PyTorch, and portable graph models such as ONNX desired.

**ESSENTIAL DUTIES AND RESPONSIBILITIES:**

- Design, develop, and maintain optimizing compiler for Groq's TSP
- Expand Groq IR Dialect to reflect ever changing landscape of ML constructs and models
- Benchmark and analyze output produced by optimizing compiler, and drive enhancements to improve its quality-of-results when measured on the Groq TSP hardware.
- Manage large multi-person and multi-geo projects and interface with various leads across the company
- Mentor junior compiler engineers and collaborate with other senior compiler engineers on the team.
- Review and accept code updates to compiler passes and IR definitions.
- Work with HW teams and architects to drive improvements in architecture and SW compiler
- Publish novel compilation techniques to Groq's TSP at top-tier ML, Compiler, and Computer Architecture conferences.

**QUALIFICATIONS:**

- 10+ years of experience in the area of computer science/engineering or related
- 5+ years of direct experience with C/C++ and LLVM or compiler frameworks
- Knowledge of spatial architectures such as FPGA or CGRAs an asset
- Knowledge of functional programming an asset
- Experience with ML frameworks such as TensorFlow or PyTorch desired
- Knowledge of ML IR representations such as ONNX and Deep Learning

**PERSONAL ATTRIBUTES:**

- Strong initiative and personal drive, able to self-motivate and drive projects to closure
- Keen attention to detail
- Strong written and oral communication; ability to write clear and concise technical documentation
- Team first attitude
- Leadership skills and ability to motivate peers
- Coaching and mentoring ability

Pay range: $143,600 to $251,300
Location: Toronto, Canada
Work site: Remote

NextSilicon

Machine Learning Software Engineer, PyTorch Specialist

**Description**
The AI Infrastructure team is developing infrastructure and tools for automating the process of adapting our unique hardware architecture to run machine learning model training and inference, as well as AI applications.
We are seeking a talented machine learning (ML) PyTorch expert to join our AI Infrastructure team in Belgrade. In this high-visibility, hands-on role, you will be building the AI compiler backend for NextSilicon’s next generation platform.

**Requirements**

- Bachelor's degree in either computer science, computer engineering, or another relevant technical field, or equivalent practical experience.
- 6+ years of experience in software engineering or a relevant field, or 4+ years of experience if you have a PhD.
- 3+ years of specialization experience in at least one of the following ML or deep learning domains:
- ML systems: AI infrastructure, ML accelerators, high performance computing, GPU and/or CPU architecture.
- ML tools: ML compilers, ML frameworks.
- ML theory: large scale ML, LLM, ML robustness experience.
- Hands-on experience applying techniques for splitting and parallelizing model training and inference onto a multiple GPU environment.
- Experience developing state-of-the-art neural network architectures: an advantage.
- Experience with open source software development: an advantage.
- Experience developing and upstreaming into the PyTorch open source community: an advantage.

**Responsibilities**

- Develop a state-of-the-art AI framework and compiler stack that will run seamlessly on our next generation hardware, and deliver superior acceleration of AI applications.
- Contribute to the development of an industry leading PyTorch AI framework core compiler.
- Analyze DL/ML networks and AI applications to understand how they may run more efficiently on the NextSilicon architecture, and develop and implement a compiler backend to support these optimizations.
- Help to define NextSilicon next generation AI hardware architecture by modeling hardware performance using software simulations.

Pay range: N/A
Location: N/A
Work site: N/A

Qualcomm

Machine Learning Compiler / Firmware Engineer

**Role Overview:**

As a PyTorch and C++ Development Engineer, you will play a crucial role in developing the machine learning compiler and runtime firmware for Qualcomm’s best-in-class accelerator. You’ll work closely with a geographically distributed team to optimize performance, enhance efficiency, and ensure seamless integration with our hardware.

**Responsibilities:**

- Collaborate with software architects and machine learning researchers to design and implement efficient PyTorch-based solutions.
- Develop and maintain the machine learning compiler, ensuring compatibility with the ATEN operator set.
- Optimize code for performance, memory usage, and power efficiency.
- Debug and troubleshoot issues related to the runtime firmware.
- Contribute to the development of a modern C++ project which uses extensive template metaprogramming techniques.
- Work independently and in a self-directed manner, while also collaborating effectively with remote team members.

**Qualifications:**

- Master’s degree with 3 or more years of experience, or a PhD in Computer Science, Electrical Engineering, or a related field.
- Strong proficiency in Python and experience with PyTorch.
- Solid understanding of the ATEN operator set.
- Proficiency in modern C++ (C++17 or later).
- Experience with machine learning frameworks and compiler development is a plus.
- Excellent problem-solving skills and ability to work in a fast-paced environment.
- Strong communication skills for collaborating with remote team members.

**Minimum Qualifications:**

- Bachelor's degree in Electrical Engineering, Computer Science, Computer Engineering, or related field and 2+ years of Software Engineering, Electrical Engineering, Systems Engineering, or related work experience.
- OR
- Master's degree in Electrical Engineering, Computer Science, Computer Engineering, or related field and 1+ year of Software Engineering, Electrical Engineering, Systems Engineering, or related work experience.
- OR
- PhD in Electrical Engineering, Computer Science, Computer Engineering, or related field.
- 2+ years of experience with high-performance microprocessor design.

Pay range: $127,200.00 - $190,800.00
Location: Austin, Texas, United States of America
Work site: N/A

Machine Learning Compiler Engineer

**General Summary:**

If you’re interested in advancing and applying mathematics, programming languages theory, and advanced algorithms to program optimization for cutting-edge machine learning accelerators, then you really want to be talking to us!
The Compiler Labs unit in Qualcomm AI Software department is looking for ML Compiler engineers to join our team. We work tactically on improving existing ML compilers and strategically on developing new and innovative ML compilers.
Our technical approach to compilers emphasizes powerful representations for precisely and compactly modeling programs and the optimization challenges and using advanced mathematics and algorithms for performing optimizations.
We are also solid in using “old school” compiler technologies as they apply to contemporary ML challenges, and in meticulous software engineering to produce beautiful compilers.  We are also keen about seeing our compilers used and having large impacts on Qualcomm’s business.
Mapping ML algorithms to ML accelerators is currently one of the most interesting and challenging problems for compilers.  Our compiler targets include the Qualcomm Neural Signal Processor, Adreno GPUs, low-power ML accelerators, and CPU accelerators.
This job description spans multiple levels, from entry to experienced.  Our team is a good home for compiler developers with advanced degrees, and we have solid mentoring and give substantial responsibility quickly for entry level engineers.

**Preferred Qualifications:**

- Master's degree in Computer Science, Engineering, Electrical Engineering, or related field.
- Experience with compiler development and computer architecture
- ML experience
- A degree in the field of computer science or applied mathematics
- Experience with software engineering
- Solid intellectual ability, motivation, and a strong history of achievement
- Excellent oral and written communication skills
- Experience with MLIR, MLIR Dialects (LinAlg, Affine), Pytorch 2.0, TVM, Triton, and/or LLVM
- SYCL experience
- ML applications and ML optimization experience
- ML architecture experience
- High performance computing experience
- Polyhedral compiler optimization experience
- Loop transformation and vectorization experience
- GPU programming, parallel programming experience
- General optimization experience

**Principal Duties and Responsibilities:**

- Work on a wide range of ML compilers
- Improve ML compiler optimization capabilities through benchmark analysis and profiling
- Innovate new ML compiler and optimization algorithms
- Upstream compiler algorithms to open-source compiler projects
- Author research publications and represent the company in conferences and industry forums
- For senior levels - Lead and manage projects while doing substantial technical work

**Minimum Qualifications:**

- Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
- OR
- Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
- OR
- PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Pay range: $133,500.00 - $200,500.00
Location: Raleigh, North Carolina, United States of America
Work site: N/A

Machine Learning Compiler Engineer

**General Summary:**

If you’re interested in advancing and applying mathematics, programming languages theory, and advanced algorithms to program optimization for cutting-edge machine learning accelerators, then you really want to be talking to us!
The Compiler Labs unit in Qualcomm AI Software department is looking for ML Compiler engineers to join our team. We work tactically on improving existing ML compilers and strategically on developing new and innovative ML compilers.
Our technical approach to compilers emphasizes powerful representations for precisely and compactly modeling programs and the optimization challenges and using advanced mathematics and algorithms for performing optimizations.
We are also solid in using “old school” compiler technologies as they apply to contemporary ML challenges, and in meticulous software engineering to produce beautiful compilers.  We are also keen about seeing our compilers used and having large impacts on Qualcomm’s business.
Mapping ML algorithms to ML accelerators is currently one of the most interesting and challenging problems for compilers.  Our compiler targets include the Qualcomm Neural Signal Processor, Adreno GPUs, low-power ML accelerators, and CPU accelerators.
This job description spans multiple levels, from entry to experienced.  Our team is a good home for compiler developers with advanced degrees, and we have solid mentoring and give substantial responsibility quickly for entry level engineers.

**Preferred Qualifications:**

- Master's degree in Computer Science, Engineering, Electrical Engineering, or related field.
- Experience with compiler development and computer architecture
- ML experience
- A degree in the field of computer science or applied mathematics
- Experience with software engineering
- Solid intellectual ability, motivation, and a strong history of achievement
- Excellent oral and written communication skills
- Experience with MLIR, MLIR Dialects (LinAlg, Affine), Pytorch 2.0, TVM, Triton, and/or LLVM
- SYCL experience
- ML applications and ML optimization experience
- ML architecture experience
- High performance computing experience
- Polyhedral compiler optimization experience
- Loop transformation and vectorization experience
- GPU programming, parallel programming experience
- General optimization experience

**Principal Duties and Responsibilities:**

- Work on a wide range of ML compilers
- Improve ML compiler optimization capabilities through benchmark analysis and profiling
- Innovate new ML compiler and optimization algorithms
- Upstream compiler algorithms to open-source compiler projects
- Author research publications and represent the company in conferences and industry forums
- For senior levels - Lead and manage projects while doing substantial technical work

**Minimum Qualifications:**

- Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
- OR
- Master's degree in Computer Science, Engineering, Information Systems, or related field and 1+ year of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
- OR
- PhD in Computer Science, Engineering, Information Systems, or related field.

Pay range: $133,000.00 - $200,000.00
Location: New York City, New York, United States of America
Work site: N/A

Renesas

AI Compiler Engineer job in REE(Bourne End) (511B)

**Job Description**

In this role, you will be part of the Machine Learning Core team. The team has been developing a comprehensive AI Compiler strategy that delivers a highly flexible platform to explore new DL/ML model architectures, combined with auto-tuned high performance for production environments across a wide range of hardware architectures. The compiler framework, ML graph optimizations and kernel authoring specific to the hardware impacts performance, developer efficiency & deployment velocity of both AI training and inference platforms. You will be developing AI compiler frameworks to accelerate machine learning workloads on the next generation of AI hardware. You will work closely with AI researchers to analyze deep learning models and how to lower them efficiently on AI platforms. You will also partner with hardware design teams to develop compiler optimizations for high performance. You will apply software development best practices to design features, optimization, and performance tuning techniques. You will gain valuable experience in developing machine learning compiler frameworks and will help in driving next generation hardware software co-design for AI domain specific problems.

Our mission is to use the latest machine learning and cloud technologies to develop the best AI inference for self-driving vehicle and advanced driver safety engineers. Renesas is the leading automotive electronics supplier globally, and this is a rare opportunity to deploy your AI software to the billions of devices we ship to customers every year. You will join our newly formed AI Solutions global research and development organization of around 100 software engineers. Due to strong demand for our AI-related products we are planning to triple in size in the next three years, so there is lots of room for you to help us grow the team together while remaining small. Our key locations are the Tokyo, London, Dusseldorf, and Ho Chi Minh City metro areas, but you can also join fully remotely from other locations globally or get our support to relocate to our key location hubs such as Tokyo.

**Responsibilities**

- Development of AI compiler framework, high performance kernel authoring and acceleration onto next generation of hardware architectures.
- Contribute to the development of the industry-leading machine learning framework core compilers to support new state of the art inference and training machine learning/AI accelerators and optimize their performance.
- Analyze deep learning networks, develop & implement compiler optimization algorithms.
- Collaborating with AI research scientists to accelerate the next generation of deep learning models such as recommendation systems, computer vision, or natural language processing.
- Performance tuning & optimizations of deep learning frameworks.

**Qualifications**

- Bachelor’s or Master’s degree in computer science, machine learning, mathematics, physics, electrical engineering or related field.
- Experience in C/C++, Python, or other related programming language
- Experience in accelerating deep learning models or libraries on hardware architectures.
- Experience working with machine learning frameworks such as PyTorch, TensorFlow, ONNX etc.
- Ability to speak and write in English at a business level.

Pay range: N/A
Location: N/A
Work site: (maybe) up to 100% remote

AI Compiler Engineer, Model Optimization, Quantization & Framework (f/m/d)_Japan [HPC AI] job in Tokyo, Tokyo, Japan

**Job Description**

- Development of AI compiler framework, high performance kernel authoring and acceleration onto next generation of hardware architectures.
- Contribute to the development of the industry-leading machine learning framework core compilers to support new state of the art inference and training machine learning/AI accelerators and optimize their performance.
- Collaborating with AI research scientists to accelerate the next generation of deep learning models such as recommendation systems, computer vision, or natural language processing.
- Performance tuning & optimizations of deep learning frameworks.
- Model optimization by developing the pruning & quantization algorithms and hardware neural architecture search technique.

**Qualifications**

- Bachelor’s or Master’s degree in computer science, machine learning, mathematics, physics, electrical engineering or related field.
- Experience in C/C++, Python, or other related programming language
- Experience in accelerating deep learning models or libraries on hardware architectures.
- Experience with Post Training Quantization (PTQ), Quantization Aware Training (QAT) and other quantization techniques and strategies
- Experience working with machine learning frameworks such as PyTorch, TensorFlow, ONNX etc.
- Ability to speak and write in English at a business level.
- Experience of Product Owner of scrum team is plus.

Pay range: N/A
Location: Tokyo, Japan
Work site: Remote OK

Samsung Semiconductor

Staff Engineer, AI/ML Software Compiler Job in San Jose, CA

**What You'll Do**

The AGI (Artificial General Intelligence) Computing Lab is dedicated to solving the complex system-level challenges posed by the growing demands of future AI/ML workloads. Our team is committed to designing and developing scalable platforms that can effectively handle the computational and memory requirements of these workloads while minimizing energy consumption and maximizing performance. To achieve this goal, we collaborate closely with both hardware and software engineers to identify and address the unique challenges posed by AI/ML workloads and to explore new computing abstractions that can provide a better balance between the hardware and software components of our systems. Additionally, we continuously conduct research and development in emerging technologies and trends across memory, computing, interconnect, and AI/ML, ensuring that our platforms are always equipped to handle the most demanding workloads of the future. By working together as a dedicated and passionate team, we aim to revolutionize the way AI/ML applications are deployed and executed, ultimately contributing to the advancement of AGI in an affordable and sustainable manner. Join us in our passion to shape the future of computing!

Location: Hybrid, working onsite at our office 3 days per week with the flexibility to work remotely the remainder of your time

Reports to: VP

Design and implement ML compilers for high-performance deep learning applications.
Optimize compilers for efficient execution of deep learning models on various hardware platforms.
Design a staged lowering infrastructure to meet rapidly evolving workload requirements effectively.
Design an algorithm to optimize data locality to minimize energy consumption.
Work closely with hardware architects and developers to integrate new ML techniques and algorithms into the compiler.
Collaborate with cross-functional teams to define and deliver ML compiler features and improvements.
Troubleshoot and debug compiler issues, and provide technical support to customers.
Contribute to the development of ML compiler documentation and user guides.
Stay up-to-date with the latest trends and advancements in the field of ML compilers and hardware.

**What You Bring**

- BS in Computer/Electrical Engineering or Computer Science with 10+ years of working experiences in silicon development or MS in Computer/Electrical Engineering or Computer Science with 8+ years of relevant working experience or PhD and 5+ years of relevant working experience preferred.
- Strong background in compiler design and optimization techniques.
- Experiences in developing and optimizing software for high-performance computing systems
- Experiences in LLVM / MLIR (preferred)
- Familiarity with PyTorch, Tensorflow, or JAX.
- Familiarity with hardware architectures such as CPUs, GPUs, TPUs, and NPUs.
- Strong analytical and problem-solving skills
- Excellent communication and interpersonal skills
- Ability to work independently and as part of a team
- You're inclusive, adapting your style to the situation and diverse global norms of our people.
- An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
- You're collaborative, building relationships, humbly offering support and openly welcoming approaches.
- Innovative and creative, you proactively explore new ideas and adapt quickly to change.

Pay range:
Location: San Jose, CA
Work site: N/A

SiMa.ai

Sr Principal Compiler Engineer (AI2312)

**Job Description:**

SiMa.ai is looking for a Sr. Principal Compiler Engineer to join our world class team and make significant contributions to our state-of-the-art Machine Learning Accelerator (MLA) compiler.  As a member of SiMa’s ML Compiler team, you will have the opportunity to work on an innovative compiler that generates highly performant and power-efficient assembly code for a wide range of machine learning models.
You will also have the opportunity to influence SiMa’s roadmap by proposing enhancements to future generations of the MLA.

**Responsibilities:**

- Design and implement algorithms within the compiler framework that generate highly performant and power-efficient assembly code for a wide range of machine learning models.
- Analyze the assembly code generated by the compiler for various machine learning models and identify opportunities for improvement.
- Work closely with the front-end team to identify ways in which the intermediate representation (IR) can be enhanced to better support the compiler back-end.
- Improve the quality of the code base by refactoring code and enhancing documentation.
- Contribute to SiMa’s roadmap by proposing new features for both our software products and  future generations of the MLSoC.

**Requirements:**

- Must-have:
- Ph.D. or M.S. in Computer Science or a related field with 5+ years of experience developing highly performant systems software.
- Strong programming skills in C, C++ or Python.
- Strong background in algorithms and data structures.
- Strong understanding of common machine learning algorithms such as matrix multiplication and convolution.
- Strong understanding of processor architecture.
- Strong analytical background.
- Strong debugging skills.
- Experience in compiler design and implementation (preferably a production-quality compiler).
- Optional:
- Experience with DSP programming.

Pay range: $240,000 - $305,000
Location: San Jose, CA
Work site:

Synopsys

AI Compiler Engineer, Staff

**What you’ll be doing:**

- Developing and enhancing various components of the Synopsys AI compiler which targets ARC Neural Network Processor IP
- Working closely with compiler architects by contributing to some aspects of specification and design creation
- Implementing compiler features which improve the performance, scalability, usability and testability of the AI compiler
- Developing compiler passes for lowering a high-level intermediate representation to a low-level hardware accelerator representation
- Integrating existing OSS compiler frameworks to enhance the capabilities of the Synopsys AI compiler
- Developing mapping and optimization algorithms which partition compute tasks across multiple hardware accelerators in optimal ways
- Contributing to compiler QA, including creation of test plans & test automations, execution of tests and creation of reports

**The impact you will have:**

- Enabling the development of new capabilities in our neural network compiler, making it more robust and efficient.
- Contributing to the development of cutting-edge AI technologies that drive innovation in various industries.
- Improving the performance and accuracy of neural network-based workloads using ARC Neural Network Processor IP
- Supporting third-party developers in utilizing our tools to create advanced AI applications
- Helping Synopsys maintain its leadership position in the semiconductor IP market
- Driving continuous improvement and innovation within the ARC Processor team

**What you will need:**

- 5+ years of proven experience in developing compilers for domain-specific processors or other similar resource-constrained hardware, as well as good understanding of compiler theory and compiler industry trends
- Hands-on experience working on compiler optimizations such as auto-parallelization, auto-scheduling and performance analysis, preferably with NN compilers
- Good understanding of state-of-the-art deep learning concepts, methods and models, including low-bit model quantization techniques
- Experience using open-source Neural Network technologies like LLVM / MLIR, TVM, Glow, xbyak, etc and frameworks like ONNX, PyTorch, TensorFlow
- Ability to write clean, scalable, and maintainable production-level C++ code
- Working experience with embedded systems and hardware device control
- Solid troubleshooting and analytical skills, experience in testing production software, preferably compilers
- Familiarity with Agile development methodologies
- Familiarity with Git source control management
- Excellent problem solving and critical thinking skills
- Team player with good interpersonal skills

**Who you are**

- Detail-oriented with excellent problem-solving skills
- Strong communicator who can effectively convey technical information
- Collaborative team player who thrives in a dynamic environment
- Innovative thinker with a passion for continuous learning and improvement
- Adaptable and able to handle multiple tasks and projects simultaneously
- Committed to excellence and delivering high-quality results

Pay range: N/A
Location: Eindhoven, North Brabant, Netherlands
Work site: N/A

Software Engineer (AI/ NN Compiler Engineer)

**What You’ll Be Doing:**

- Developing and enhancing the compiler’s post-training-quantization process targeting ARC Neural Network Processor IP.
- Creating algorithms to support the transition from high-level intermediate representation to low-level hardware accelerator representation.
- Debugging, analyzing issues, and verifying the quantization-related implementation, including functionality, performance, and accuracy.
- Maintaining and updating technical documentation to ensure clarity and completeness.
- Collaborating with cross-functional teams to integrate new features and improvements.
- Conducting performance tuning and optimization to maximize efficiency and effectiveness.

**The Impact You Will Have:**

- Enhancing the capabilities of our neural network compiler, making it more robust and efficient.
- Contributing to the development of cutting-edge AI technologies that drive innovation in various industries.
- Improving the performance and accuracy of neural network-based workloads on SoCs with ARC Neural Network Processor IP.
- Supporting third-party developers in utilizing our tools to create advanced AI applications.
- Helping Synopsys maintain its leadership position in the semiconductor IP market.
- Driving continuous improvement and innovation within the ARC Processor team.

**What You’ll Need:**

- 3+ years of proven experience in developing AI software applications or tooling for NN frameworks like ONNX, PyTorch, TensorFlow, or domain-specific tools.
- Familiarity with quantization algorithms and NN operator implementation.
- Ability to write clean, scalable, and maintainable production-level C++ code.
- General understanding of state-of-the-art deep learning concepts, methods, and models, with a willingness to investigate new algorithms and trends.
- Familiarity with Git source control management.
- Excellent written and spoken English skills.

**Who You Are:**

- Detail-oriented with excellent problem-solving skills.
- Strong communicator who can effectively convey technical information.
- Collaborative team player who thrives in a dynamic environment.
- Innovative thinker with a passion for continuous learning and improvement.
- Adaptable and able to handle multiple tasks and projects simultaneously.
- Committed to excellence and delivering high-quality results.

Pay range: N/A
Location: Hsinchu, Taiwan, Taiwan
Work site: N/A

Tenstorrent

Job Application for Machine Learning Engineer, AI Compiler, Model Training

Tenstorrent is seeking a talented and motivated Machine Learning Engineer to join our team in Belgrade, Serbia. As a Machine Learning Engineer, you will play a crucial role in developing and optimizing our model training pipeline within our machine learning graph compiler.

**Responsibilities:**

- Develop and optimize the model training pipeline in our machine learning graph compiler.
- Benchmark, analyze, and optimize the performance of training multiple state-of-the-art model architectures across Tenstorrent's hardware and software stack.
- Scale out to various modes of distributed model training.
- Participate in the co-design of Tenstorrent's hardware and software stack.
- Develop performance analysis and estimation infrastructure that feeds into Tenstorrent compiler.
- Integrate Tenstorrent software into leading machine learning frameworks.

**Experience & Qualifications:**

- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field.
- Strong understanding of neural network fundamentals.
- Experience with model training - experience with distributed training is considered a plus.
- Familiarity with modern neural network architectures. In-depth understanding of one architecture.
- Proficiency in programming languages such as Python and C++.
- Excellent problem-solving and analytical skills.
- Ability to work effectively in a collaborative team environment.

Pay range: N/A
Location: Belgrade, Belgrade, Serbia
Work site: on-site

Job Application for Software Engineer, AI Compiler

We are seeking an experienced and highly skilled Software Engineer with expertise in compilers and semiconductor technology to join our team. As a Software Engineer, AI Compiler Specialist, you will play a critical role in designing, developing, and optimizing compilers for cutting-edge semiconductor products. You will work closely with hardware engineers, software engineers, and other stakeholders to ensure the efficient and effective execution of compiler-related tasks.

**Responsibilities:**

- Develop machine learning graph compiler
- BSc, MSc or PhD in Electrical/Computer Engineering or Computer Science
- Experience with algorithms, data structures, and software development in C/C++. Python expertise is welcome as well
- Familiarity with and passion for any of the following -- machine learning, compilers, parallel programming, high-performance and massively parallel systems, processor and computer architecture -- is a plus
- Participate in the co-design of Tenstorrent's hardware and software stack
- Benchmark, analyze, and optimize performance of key machine learning applications across Tenstorrent's hardware and software stack
- Develop performance analysis and estimation infrastructure that feeds into Tenstorrent compiler
- Develop high-performance run-time engine
- Integrate the Tenstorrent software into leading machine learning frameworks
- Work closely with machine learning engineers to discover the hardware and software requirements of current and future machine learning applications

**Experience & Qualifications:**

- BSc, MSc or PhD in Electrical/Computer Engineering or Computer Science
- Experience with algorithms, data structures, and software development in C/C++. Python expertise is welcome as well
- Familiarity with and passion for any of the following -- machine learning, compilers, parallel programming, high-performance and massively parallel systems, processor and computer architecture -- is a plus

Pay range: N/A
Location: Belgrade, Belgrade, Serbia
Work site: on-site

We are seeking an experienced and highly skilled Sr. Software Engineer with expertise in compilers and semiconductor technology to join our team. As a Sr. Software Engineer, AI Compiler Specialist, you will play a critical role in designing, developing, and optimizing compilers for cutting-edge semiconductor products. You will work closely with hardware engineers, software engineers, and other stakeholders to ensure the efficient and effective execution of compiler-related tasks.

**Responsibilities:**

- Develop machine learning graph compiler
- Benchmark, analyze, and optimize performance of key machine learning applications across Tenstorrent's hardware and software stack
- Develop performance analysis and estimation infrastructure that feeds into Tenstorrent compiler
- Integrate the Tenstorrent software into leading machine learning frameworks
- Work closely with machine learning engineers to discover the hardware and software requirements of current and future machine learning applications

**Experience & Qualifications:**

- BSc, MSc or PhD in Electrical/Computer Engineering or Computer Science
- Familiarity with common AI/ML models, and basics of machine learning operations (matrix multiplications, convolutions, etc.)
- Experience with algorithms, data structures, and software development in C/C++.
- Expertise in Python.
- Familiarity with and passion for any of the following -- machine learning, compilers, parallel programming, high-performance and massively parallel systems, processor and computer architecture -- is a plus

Pay range: N/A
Location: Toronto, Ontario, Canada
Work site: hybrid

Job Application for Staff Software Engineer, AI Compiler

We are seeking an experienced and highly skilled Software Engineer with expertise in compilers and semiconductor technology to join our team. As a Staff Software Engineer, AI Compiler Specialist, you will play a critical role in designing, developing, and optimizing compilers for cutting-edge semiconductor products. You will work closely with hardware engineers, software engineers, and other stakeholders to ensure the efficient and effective execution of compiler-related tasks.

**Responsibilities:**

- Develop machine learning graph compiler
- Participate in the co-design of Tenstorrent's hardware and software stack
- Benchmark, analyze, and optimize performance of key machine learning applications across Tenstorrent's hardware and software stack
- Develop performance analysis and estimation infrastructure that feeds into Tenstorrent compiler
- Develop high-performance run-time engine
- Integrate the Tenstorrent software into leading machine learning frameworks
- Work closely with machine learning engineers to discover the hardware and software requirements of current and future machine learning applications

**Experience & Qualifications:**

- BSc, MSc or PhD in Electrical/Computer Engineering or Computer Science
- Experience with algorithms, data structures, and software development in C/C++. Python expertise is welcome as well
- Familiarity with and passion for any of the following -- machine learning, compilers, parallel programming, high-performance and massively parallel systems, processor and computer architecture -- is a plus

Pay range: N/A
Location: Austin, Texas, United States
Work site: hybrid

Untether AI

Compiler Engineer

We’re looking for best in class engineers to join our existing top-notch team.  When you join Untether AI, you will be part of a team that designs, develops and verifies the software that interacts with our chip, collaborating with our hardware engineers and with fellow software engineers in the process.  By creating software that fully realizes the capabilities of the hardware, you will help get AI inference to the general populace.
As part of this exceptional team, you are able to - and get excited by - identifying functional/performance bottlenecks and how to alleviate them in order to achieve scalable and reliable software.  You excel in an environment with complex software and hardware designs.
We are looking for an experienced Compiler Engineer. Our Compiler Engineers write software that translates a wide variety of neural nets into efficient mappings and fast implementations on our accelerator hardware, from data-in to data-out.

**Responsibilities**

- Build software that maps a neural net onto our hardware
- Devise and implement multiple data layout strategies
- Build a tool that will solve a network layout for a set of constraints within the hardware given the available strategies
- Implement efficient mappings between data layouts
- Evaluate current and proposed hardware architecture for future products
- Work closely with algorithm design and architecture teams

**Preferred Skills & Experience**

- Computer Science, Engineering or related degree, preferably MS or PhD
- 2+ years of related experience
- Thorough understanding of deep neural nets
- Experience developing the internals of modern optimizing compilers
- Understanding of advanced optimization techniques

Pay range: N/A
Location: Toronto / Silicon Valley
Work site: hybrid

Staff Software Engineer - Kernel

We are looking for an experienced Senior Kernel Engineer who can help build and optimize our SDK. Our tools and libraries unlock industry-leading performance and power efficiency on our unique at-memory AI inference chips.  We enable customers to compile models directly to run on our architectures, and provide tools to analyze and optimize performance.
The kernel library is at the heart of our SDK, leveraging HW features for fast computations, dividing work flexibly amongst parallel computation elements, as well as providing highly configurable data-flow options for all of our kernels.
The successful candidate will build a deep understanding of the capabilities and limitations of the architecture, and of how features of the kernel library enable performant push-button compilations.

**Responsibilities**

- Efficient and flexible implementation of neural network compute kernels for our chip families
- Defining / Improving abstractions of our kernel library to accelerate kernel development
- Analysis and optimization of individual kernel performance and full-network implementation performance
- Work closely with our compiler and physical allocation teams to enable efficient implementations of networks through our push-button compile tool-flow

**Requirements**

- Computer Science, Engineering, Math, Physics or related degree
- Experienced in Python, C/C++ and SW design
- Demonstrated ability to work independently through challenging but tightly constrained problems
- Demonstrated ability to be a technical leader on projects with teammates or engineers from other teams
- Interest and ability to work with both high level architectural and very low-level technical details
- Experience with low-level and/or parallelization optimization, e.g. assembly language development, GPU shaders, SIMD, CUDA, AI inference accelerator kernels

**Preferred Skills**

- Experience with spatial architectures / at-memory compute
- Knowledge of AI algorithms
- Strong mathematical skills
- Enjoy solving very complex problems (like doing IQ tests, solving tricky math problems)

Pay range: N/A
Location: North America/Remote
Work site: Hybrid?

Lightning AI

Research/Compiler Engineer Job in Palo Alto, CA

**What we're looking for**

We are looking for a research engineer to work directly on the Lightning Thunder compiler and the rest of the PyTorch Lightning stack. This is an opportunity to create groundbreaking technology that will transform the machine learning ecosystem.

**What you'll do**

- Develop the Thunder compiler, an open-source project developed in collaboration with NVIDIA, using your deep experience in PyTorch, JAX, or other deep learning frameworks
- Engage in performance-oriented model optimizations, around distributed training as well as inference
- Develop optimized kernels in CUDA or Triton to target specific use-cases
- Integrate Thunder throughout the PyTorch Lightning ecosystem
- Engage with the community and champion its growth
- Support the adoption of Thunder across the industry
- Work closely within the Lightning team as a strategic partner

Pay range:
Location: Palo Alto, CA
Work site: N/A

OpenAI

Software Engineer, Triton Compiler

**About the Role**

As a Software Engineer, you will help build AI systems that can perform previously impossible tasks or achieve outstanding levels of performance. This requires good engineering (for example designing, implementing, and optimizing state-of-the-art AI models), writing bug-free machine learning code (surprisingly difficult!), and building the science behind the algorithms employed. In all the projects this role pursues, the ultimate goal is to push the field forward.
The Research Acceleration team builds high-quality research tools and frameworks to increase research productivity across OpenAI, with the goal of accelerating progress towards AGI. For example, we develop Triton, a language and compiler for writing custom GPU kernels. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA.
We frequently collaborate with other teams to speed up the development of new state-of-the-art capabilities. For example, we recently collaborated with our Codegen research team on the Codex model, which can generate code in Python and many other languages.
Do you love research tools, compilers, and collaborating on cutting-edge AI models? If so, this role is for you! We are looking for people who are self-directed and enjoy determining the most meaningful problem to solve in order to accelerate our research.

**We're looking for a track record of:**

3+ years of relevant engineering experience
Owning problems end-to-end, with a willingness to pick up whatever knowledge is missing to get the job done
Bonus: contributions to an AI framework such as PyTorch or Tensorflow, or compilers such as GCC, LLVM, or MLIR

Compensation: $200K – $370K
Location: San Francisco
Work site: Remote

Amazon

ML Compiler Engineer, Annapurna Labs - Job ID: 2616772

The AWS Neuron Compiler team is actively seeking skilled compiler engineers to join our efforts in developing a state-of-the-art deep learning compiler stack. This stack is designed to optimize application models across diverse domains, including Large Language and Vision, originating from leading frameworks such as PyTorch, TensorFlow, and JAX. Your role will involve working closely with our custom-built Machine Learning accelerators, including Inferentia/Trainium, which represent the forefront of AWS innovation for advanced ML capabilities, powering solutions like Generative AI.
In this role as a ML Compiler engineer, you'll be instrumental in designing, developing, and optimizing features for our compiler. Your responsibilities will involve tackling crucial challenges alongside a talented engineering team, contributing to leading-edge design and research in compiler technology and deep-learning systems software. Additionally, you'll collaborate closely with cross-functional team members from the Runtime, Frameworks, and Hardware teams to ensure system-wide performance optimization.
As part of the Backend team, you'll play a significant role in designing and developing various aspects of our system. This includes but is not limited to instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation, Instruction Set Architectures, new hardware bring-up, and hardware-software co-design.
AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.
Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.

**Key job responsibilities**

- Solve challenging technical problems, often ones not solved before, at every layer of the stack.
- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
- Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**

- B.S. or M.S. in computer science or related field
- Proficiency with 1 or more of the following programming languages: C++ (preferred), Python
- 3+ years of non-internship professional software development experience
- 2+ years of experience developing compiler optimization, graph-theory, hardware bring-up, FPGA placement and routing algorithms, or hardware resource management

**PREFERRED QUALIFICATIONS**

- M.S. or Ph.D. in computer science or related field
- Strong knowledge in one or more of the areas of: compiler design, instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation, Instruction Set Architectures, new hardware bring-up, and hardware-software co-design
- Experience with LLVM and/or MLIR
- Experience with developing algorithms for simulation tools
- Experience is TensorFlow, PyTorch, and/or JAX
- Experience in LLM, Vision or other deep-learning models

Pay range: $129,300/year up to $223,600/year
Location: USA, CA, Cupertino
Work site: N/A

ML Compiler Engineer, Annapurna Labs - Job ID: 2617954

The AWS Neuron Compiler team is actively seeking skilled compiler engineers to join our efforts in developing a state-of-the-art deep learning compiler stack. This stack is designed to optimize application models across diverse domains, including Large Language and Vision, originating from leading frameworks such as PyTorch, TensorFlow, and JAX. Your role will involve working closely with our custom-built Machine Learning accelerators, including Inferentia/Trainium, which represent the forefront of AWS innovation for advanced ML capabilities, powering solutions like Generative AI.
In this role as a ML Compiler engineer, you'll be instrumental in designing, developing, and optimizing features for our compiler. Your responsibilities will involve tackling crucial challenges alongside a talented engineering team, contributing to leading-edge design and research in compiler technology and deep-learning systems software. Additionally, you'll collaborate closely with cross-functional team members from the Runtime, Frameworks, and Hardware teams to ensure system-wide performance optimization.
As part of the Backend team, you'll play a significant role in designing and developing various aspects of our system. This includes but is not limited to instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation, Instruction Set Architectures, new hardware bring-up, and hardware-software co-design.
AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.
Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.

**Key job responsibilities**

- Solve challenging technical problems, often ones not solved before, at every layer of the stack.
- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
- Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**

- B.S. or M.S. in computer science or related field
- Proficiency with 1 or more of the following programming languages: C++ (preferred), Python
- 3+ years of non-internship professional software development experience
- 2+ years of experience developing compiler optimization, graph-theory, hardware bring-up, FPGA placement and routing algorithms, or hardware resource management

**PREFERRED QUALIFICATIONS**

- M.S. or Ph.D. in computer science or related field
- Strong knowledge in one or more of the areas of: compiler design, instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation, Instruction Set Architectures, new hardware bring-up, and hardware-software co-design
- Experience with LLVM and/or MLIR
- Experience with developing algorithms for simulation tools
- Experience is TensorFlow, PyTorch, and/or JAX
- Experience in LLM, Vision or other deep-learning models

Pay range: $129,300/year up to $223,600/year
Location: USA, CA, Cupertino
Work site: N/A

Sr. ML Compiler Engineer, AWS Neuron, Annapurna Labs - Job ID: 2696028

Do you love decomposing problems to develop products that impact millions of people around the world? Would you enjoy identifying, defining, and building software solutions that revolutionize how businesses operate?
The Annapurna Labs team at Amazon Web Services (AWS) is looking for a Senior Software Development Engineer to build, deliver, and maintain complex products that delight our customers and raise our performance bar. You’ll design fault-tolerant systems that run at massive scale as we continue to innovate best-in-class services and applications in the AWS Cloud.
At Annapurna Labs our vision is to make deep learning pervasive for everyday developers and to democratize access to cutting edge infrastructure. In order to deliver on that vision, we’ve created innovative software and hardware solutions that make it possible.
AWS Neuron is the SDK that optimizes the performance of complex neural net models executed on AWS Inferentia and Trainium, our custom chips designed to accelerate deep-learning workloads
The Neuron SDK consists of a compiler, run-time, and debugger, integrated with Tensorflow, PyTorch, and MXNet. It’s preinstalled in AWS Deep Learning AMIs and Deep Learning Containers for customers to quickly get started with running high performance and cost-effective inference.
The Neuron team is hiring senior compiler engineers in order to solve our customers toughest problems.
As a senior deep learning compiler engineer on the Neuron team, you will be a thought leader supporting the development of a compiler targeting AWS Inferentia and Trainum. You will be developing and scaling the compiler to handle the world's largest ML workloads. You will need to be technically capable, credible and curious in your own right as a trusted AWS Neuron engineer, innovating on behalf of our customers. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. A background in machine learning and AI accelerators is preferred, but not required.
AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.
Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.

**Key job responsibilities**

- Solve challenging technical problems, often ones not solved before, at every layer of the stack.
- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
- Build high-quality, highly available, always-on products.
- Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**

- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience as a mentor, tech lead or leading an engineering team

**PREFERRED QUALIFICATIONS**

- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent

Pay range: N/A
Location: CAN, ON, Toronto
Work site: N/A

ML Compiler Engineer , AWS Neuron, Annapurna Labs - Job ID: 2774606

Do you love decomposing problems to develop products that impact millions of people around the world? Would you enjoy identifying, defining, and building software solutions that revolutionize how businesses operate?
The Annapurna Labs team at Amazon Web Services (AWS) is looking for a Software Development Engineer II to build, deliver, and maintain complex products that delight our customers and raise our performance bar. You’ll design fault-tolerant systems that run at massive scale as we continue to innovate best-in-class services and applications in the AWS Cloud.
Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.
At AWS our vision is to make deep learning pervasive for everyday developers and to democratize access to cutting edge infrastructure. In order to deliver on that vision, we’ve created innovative software and hardware solutions that make it possible.
AWS Neuron is the SDK that optimizes the performance of complex neural net models executed on AWS Inferentia and Trainium, our custom chips designed to accelerate deep-learning workloads
The Neuron SDK consists of a compiler, run-time, and debugger, integrated with Tensorflow, PyTorch, and MXNet. It’s preinstalled in AWS Deep Learning AMIs and Deep Learning Containers for customers to quickly get started with running high performance and cost-effective inference.
The Neuron team is hiring compiler engineers in order to solve our customers toughest problems.
As a deep learning compiler engineer on the Neuron team, you will be a thought leader supporting the development of a compiler targeting AWS Inferentia and Trainum. You will be developing and scaling the compiler to handle the world's largest ML workloads. You will need to be technically capable, credible and curious in your own right as a trusted AWS Neuron engineer, innovating on behalf of our customers. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. A background in machine learning and AI accelerators is preferred, but not required.

**Key job responsibilities**

- Solve challenging technical problems, often ones not solved before, at every layer of the stack.
- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.Build high-quality, highly available, always-on products.
- Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**

- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language

**PREFERRED QUALIFICATIONS**

- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent

Pay range: N/A
Location: CAN, ON, Toronto
Work site: N/A

ML Compiler Engineer, AWS Neuron, Annapurna Labs - Job ID: 2442971

At AWS our vision is to make deep learning pervasive for everyday developers and to democratize access to cutting edge infrastructure. In order to deliver on that vision, we’ve created innovative software and hardware solutions that make it possible.
AWS Neuron is the SDK that optimizes the performance of complex neural net models executed on AWS Inferentia and Trainium, our custom chips designed to accelerate deep-learning workloads
The Neuron SDK consists of a compiler, run-time, and debugger, integrated with Tensorflow, PyTorch, and MXNet. It’s preinstalled in AWS Deep Learning AMIs and Deep Learning Containers for customers to quickly get started with running high performance and cost-effective inference.
The Neuron team is hiring senior compiler engineers in order to solve our customers toughest problems.
As a senior deep learning compiler engineer on the Neuron team, you will be a thought leader supporting the development of a compiler targeting AWS Inferentia and Trainum. You will be developing and scaling the compiler to handle the world's largest ML workloads. You will need to be technically capable, credible and curious in your own right as a trusted AWS Neuron engineer, innovating on behalf of our customers. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. A background in machine learning and AI accelerators is preferred, but not required.
AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.
Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.

**BASIC QUALIFICATIONS**

- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language

**PREFERRED QUALIFICATIONS**

- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent

Pay range: N/A
Location: CAN, ON, Toronto
Work site: N/A

Senior ML Compiler Engineer, AWS Neuron, Annapurna Labs - Job ID: 2693181

Do you love decomposing problems to develop products that impact millions of people around the world?
The AWS Neuron Compiler team is actively seeking a skilled Senior Software Development Engineer to build, deliver, and maintain a state-of-the-art deep learning compiler stack that delights our customers and raises our performance bar. This stack is designed to optimize application models across diverse domains, including Large Language and Vision, originating from leading frameworks such as PyTorch, TensorFlow, and JAX. Your role will involve working closely with our custom-built Machine Learning accelerators, Inferentia and Trainium, which represent the forefront of AWS innovation for advanced ML capabilities, powering solutions like Generative AI.
In this role as a senior ML Compiler Engineer, you'll be instrumental in designing, developing, and optimizing features for our compiler. You will develop and scale the compiler to handle the world's largest ML workloads. You will architect and implement business-critical features, publish cutting-edge research, and mentor a brilliant team of experienced engineers. You will need to be technically capable, credible, and curious in your own right as a trusted AWS Neuron engineer, innovating on behalf of our customers. Your responsibilities will involve tackling crucial challenges alongside a talented engineering team, contributing to leading-edge design and research in compiler technology and deep-learning systems software. Strong experience developing compiler optimization, graph-theory, hardware bring-up, FPGA placement and routing algorithms, or hardware resource management will be a benefit in this role.
Additionally, you'll collaborate closely with cross-functional team members from the Runtime, Frameworks, and Hardware teams to ensure system-wide performance optimization. You will leverage your technical communication skills as a hands-on partner to AWS ML services teams. You will be involved in pre-silicon design, bringing new products/features to market, and participating in many other exciting projects.
AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.
Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.

**Key job responsibilities**

- Solve challenging technical problems, often ones not solved before, at every layer of the stack.
- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
- Build high-quality, highly available, always-on products.
- Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**

- B.S. or M.S. in computer science or related field
- 5+ years of non-internship professional software development experience including full software development life cycle, encompassing coding standards, code reviews, source control management, build processes, testing, and operations experience
- 5+ years of leading design or architecture (design, reliability and scaling) of new and existing systems experience
- 5+ years of programming with C++
- 3+ years of experience developing compiler optimization, graph-theory, hardware bring-up, FPGA placement and routing algorithms, or hardware resource management
- Experience as a mentor, tech lead or leading an engineering team

**PREFERRED QUALIFICATIONS**

- M.S. or Ph.D. in computer science or related field
- Strong knowledge in one or more of the areas of: compiler design, instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation, Instruction Set Architectures, new hardware bring-up, and hardware-software co-design
- Experience with LLVM and/or MLIR
- Experience with developing algorithms for simulation tools
- Experience is TensorFlow, PyTorch, and/or JAX
- Experience in LLM, Vision or other deep-learning models

Pay range: $151,300/year up to $261,500/year
Location: USA, WA, Seattle
Work site: N/A

Sr. ML Compiler Engineer - Automated Reasoning Science, Annapurna Labs - Job ID: 2720295

The AWS Neuron Compiler team is actively seeking skilled compiler engineers to join our efforts in developing a state-of-the-art deep learning compiler stack. This stack is designed to optimize application models across diverse domains, including Large Language and Vision, originating from leading frameworks such as PyTorch, TensorFlow, and JAX. Your role will involve working closely with our custom-built Machine Learning accelerators, including Inferentia/Trainium, which represent the forefront of AWS innovation for advanced ML capabilities, powering solutions like Generative AI.

**Key job responsibilities**

- As a Sr. ML Compiler Engineer III on the Neuron Compiler Automated Reasoning Group, you will develop and maintain tooling for best-in-class technology for raising the bar of the Neuron Compiler's accuracy and reliability. You will help lead the efforts building fuzzers and specification synthesis tooling for our LLVM-based compiler. You will work in a team with a science focus, and strive to push what we do to the edge of what is known, to best deliver our customers.
- Strong software development skills using C++/Python are critical to this role.
- A science background in compiler development is strongly preferred. A background in Machine Learning and AI accelerators is preferred, but not required.
- In order to be considered for this role, candidates must be currently located or willing to relocate to Seattle (Preferred), Cupertino, Austin, or Toronto.

**BASIC QUALIFICATIONS**

- 6+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of experience in developing compiler features and optimizations
- Proficiency in C++ and Python programming, applied to compiler or verification projects
- Familiarity with LLVM, including knowledge of abstract interpretation and polyhedral domains
- Demonstrated scientific approach to software engineering problems

**PREFERRED QUALIFICATIONS**
- Masters degree or PhD in computer science or equivalent
- Experience with deep learning frameworks like TensorFlow or PyTorch
- Understanding of large language model (LLM) training processes
- Knowledge of CUDA programming for GPU acceleration

Pay range: $151,300/year up to $261,500/year
Location: USA, TX, Austin / USA, WA, Seattle / USA, CA, Cupertino
Work site: N/A

ML Compiler Engineer II - Automated Reasoning Science, Annapurna Labs - Job ID: 2720280

The AWS Neuron Compiler team is actively seeking skilled compiler engineers to join our efforts in developing a state-of-the-art deep learning compiler stack. This stack is designed to optimize application models across diverse domains, including Large Language and Vision, originating from leading frameworks such as PyTorch, TensorFlow, and JAX. Your role will involve working closely with our custom-built Machine Learning accelerators, including Inferentia/Trainium, which represent the forefront of AWS innovation for advanced ML capabilities, powering solutions like Generative AI.

**Key job responsibilities**

- As a Sr. ML Compiler Engineer III on the Neuron Compiler Automated Reasoning Group, you will develop and maintain tooling for best-in-class technology for raising the bar of the Neuron Compiler's accuracy and reliability. You will help lead the efforts building fuzzers and specification synthesis tooling for our LLVM-based compiler. You will work in a team with a science focus, and strive to push what we do to the edge of what is known, to best deliver our customers.
- Strong software development skills using C++/Python are critical to this role.
- A science background in compiler development is strongly preferred. A background in Machine Learning and AI accelerators is preferred, but not required.
- In order to be considered for this role, candidates must be currently located or willing to relocate to Seattle (Preferred), Cupertino, Austin, or Toronto.

**BASIC QUALIFICATIONS**

- 3+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 2+ years of experience in developing compiler features and optimizations
- Proficiency in C++ and Python programming, applied to compiler or verification projects
- Familiarity with LLVM, including knowledge of abstract interpretation and polyhedral domains
- Demonstrated scientific approach to software engineering problems

**PREFERRED QUALIFICATIONS**

- Masters degree or PhD in computer science or equivalent
- Experience with deep learning frameworks like TensorFlow or PyTorch
- Understanding of large language model (LLM) training processes
- Knowledge of CUDA programming for GPU acceleration

Pay range: $129,300/year up to $223,600/year
Location: USA, WA, Seattle / USA, CA, Cupertino
Work site: N/A

Sr. Machine Learning - Compiler Engineer III, AWS Neuron, Annapurna Labs - Job ID: 2573101

AWS Machine Learning accelerators are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium delivers the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and JAX. AWS Neuron is used at scale with customers like Snap, Autodesk, Amazon Alexa, Amazon Rekognition and more customers in various other segments.
The Amazon Annapurna Labs team is responsible for silicon development at AWS. The team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations.
The Neuron Compiler team is developing a deep learning compiler stack that takes neural network descriptions created in frameworks such as TensorFlow, PyTorch, and JAX, and converts them into code suitable for execution. The team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.
As a Machine Learning Compiler Engineer II in the AWS Neuron Compiler team, you will be supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and contributing to a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects.
A background in compiler development is strongly preferred. A background in Machine Learning and AI accelerators is preferred, but not required.
In order to be considered for this role, candidates must be currently located or willing to relocate to Cupertino (perferred), Seattle, Austin.

**BASIC QUALIFICATIONS**

- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 2+ years of experience in developing compiler features and optimizations
- Proficiency with 1 or more of the following programming languages: C++ (preferred), C, Python

**PREFERRED QUALIFICATIONS**

- Master or PhD degree in computer science or equivalent
- Proficiency with resource management, scheduling, code generation, and compute graph optimization
- Experience optimizing Tensorflow, PyTorch or JAX deep learning models
- Experience with multiple toolchains and Instruction Set Architectures

Pay range: $151,300/year up to $261,500/year
Location: USA, TX, Austin / USA, WA, Seattle / USA, CA, Cupertino
Work site: N/A

Sr. Machine Learning - Compiler Engineer III, AWS Neuron, Annapurna Labs - Job ID: 2597858

The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and MxNet. AWS Neuron and Inferentia are used at scale with customers like Snap, Autodesk, Amazon Alexa, Amazon Rekognition and more customers in various other segments.
The Team: As a whole, the Amazon Annapurna Labs team is responsible for silicon development at AWS. The team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations.
The AWS Neuron team works to optimize the performance of complex neural net models on our custom-built AWS hardware. More specifically, the AWS Neuron team is developing a deep learning compiler stack that takes neural network descriptions created in frameworks such as TensorFlow, PyTorch, and MXNET, and converts them into code suitable for execution. As you might expect, the team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.
You: As a Sr. Machine Learning Compiler Engineer III on the AWS Neuron team, you will be a thought leader supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and mentoring a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. A background in Machine Learning and AI accelerators is preferred, but not required.
In order to be considered for this role, candidates must be currently located or willing to relocate to Cupertino (perferred), Seattle, Austin, or Toronto.

**BASIC QUALIFICATIONS**

- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience as a mentor, tech lead or leading an engineering team

**PREFERRED QUALIFICATIONS**

- Bachelor's degree in computer science or equivalent

Pay range: $151,300/year in our lowest geographic market up to $261,500/year
Location: USA, CA, Cupertino
Work site: N/A

Machine Learning - Compiler Engineer II, Annapurna Labs - Job ID: 2601095

The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and MxNet. AWS Neuron and Inferentia are used at scale with customers like Snap, Autodesk, Amazon Alexa, Amazon Rekognition and more customers in various other segments.
The Team: As a whole, the Amazon Annapurna Labs team is responsible for silicon development at AWS. The team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations.
The AWS Neuron team works to optimize the performance of complex neural net models on our custom-built AWS hardware. More specifically, the AWS Neuron team is developing a deep learning compiler stack that takes neural network descriptions created in frameworks such as TensorFlow, PyTorch, and MXNET, and converts them into code suitable for execution. As you might expect, the team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.
You: Machine Learning Compiler Engineer II on the AWS Neuron team, you will be supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and contributing to a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects.

**BASIC QUALIFICATIONS**

- 3+ years of non-internship professional software development experience
- 2+ years of experience architecting and optimizing compilers
- Proficiency with 1 or more of the following programming languages: C++ (preferred), C, Python

**PREFERRED QUALIFICATIONS**

- M.S. or Ph.D. in Computer Science or related field
- Experience with multiple toolchains and Instruction Set Architectures
- Proficiency with resource management, scheduling, code generation, and compute graph optimization
- Experience optimizing Tensorflow, PyTorch or MxNET deep learning models

Pay range: $129,300/year up to $223,600/year
Location: Austin, TX, USA | Cupertino, CA, USA | Seattle, WA, USA
Work site: N/A

Compiler Engineer II - Machine Learning, Annapurna Labs - Job ID: 2714948

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.
Annapurna Labs, considered as secret sauce behind the success of AWS, is responsible for silicon development.
The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium delivers the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and JAX. AWS Neuron is used at scale with customers both internal and external.
The Team: The Neuron Compiler team is developing a deep learning compiler stack that takes state of the art LLM and Vision models created in frameworks such as TensorFlow, PyTorch, and JAX, and makes them run performantly on our accelerators. The team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.
You: As a Machine Learning Compiler Engineer II in the AWS Neuron Compiler team, you will be supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and contributing to a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects.
A background in compiler development is strongly preferred. A background in Machine Learning and AI accelerators is preferred, but not required.

**Key job responsibilities**

Our engineers collaborate across diverse teams, projects, and environments to have a firsthand impact on our global customer base. You’ll bring a passion for innovation, data, search, analytics, and distributed systems. You’ll also:
Solve challenging technical problems, often ones not solved before, at every layer of the stack.
Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
Build high-quality, highly available, always-on products.
Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**
- 3+ years of non-internship professional software development experience
- 2+ years of experience in developing compiler features and optimizations
- Proficiency with 1 or more of the following programming languages: C++ (preferred), C, Python

**PREFERRED QUALIFICATIONS**

- Masters or PhD degree in computer science or equivalent
- Experience optimizing Tensorflow, PyTorch or JAX deep learning models
- Experience with multiple toolchains like LLVM, XLA/OpenXLA, MLIR

Pay range: $129,300/year up to $223,600/year
Location: USA, WA, Seattle | USA, CA, Cupertino
Work site: N/A

Sr. Compiler Engineer III - Machine Learning, Annapurna Labs - Job ID: 2632254

Annapurna Labs builds custom Machine Learning accelerators that are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Neuron Compiler team is searching for compiler-skilled engineering talent to support the development and scaling of a compiler to enable the world's largest ML workloads to run performantly on these custom Annapurna systems.
The Product: The AWS Machine Learning accelerators represent a pinnacle of AWS technologies, specifically designed for advancing AI capabilities. The Inferentia/Trainium chips specifically offer unparalleled ML inference and training performances. They are enabled through state-of-the-art software stack - the AWS Neuron Software Development Kit (SDK). This SDK comprises an ML compiler, runtime, and application framework, which seamlessly integrate into popular ML frameworks like PyTorch. AWS Neuron, running on Inferentia and Trainium, is trusted and used by leading customers such as Snap, Autodesk, and Amazon Alexa.
The Team: Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered over the last few years.
Within this ecosystem, the Neuron Compiler team is developing a deep learning compiler stack that takes state of the art LLM and Vision models created in frameworks such as TensorFlow, PyTorch, and JAX, and makes them run performantly on our accelerators. The team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.
You: As a Sr. Machine Learning Compiler Engineer III on the AWS Neuron Compiler team, you will be supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and contributing to a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects.
A background in compiler development is strongly preferred. A background in Machine Learning and AI accelerators is preferred, but not required.

**Key job responsibilities**

- Solve challenging technical problems, often ones not solved before, at every layer of the stack.
- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.
- Build high-quality, highly available, always-on products.
- Research implementations that deliver the best possible experiences for customers.

**BASIC QUALIFICATIONS**

- 6+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- 5+ years of experience in developing compiler features and optimizations
- Proficiency with 1 or more of the following programming languages: C++, C, Python

**PREFERRED QUALIFICATIONS**

- Masters degree or PhD in computer science or equivalent
- Experience optimizing Tensorflow, PyTorch or JAX deep learning models
- Experience with multiple toolchains like LLVM, OpenXLA/XLA, MLIR, TVM

Pay range: $151,300/year up to $261,500/year
Location: USA, TX, Austin | USA, WA, Seattle | USA, CA, Cupertino
Work site: N/A

Google

Machine Learning Compiler Software Engineer, TPU Horizontal Scaling

**About the job**

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.
Our team develops the Accelerated Linear Algebra (XLA) TPU/GPU parallelizing compiler used to partition, optimize, and run large-scale machine learning models across multiple TPU/GPU accelerators for internal and external customers. The XLA Horizontal Scaling team’s software stack includes the XLA Single Program Multiple Data (SPMD) partitioner, collective and scheduling optimizations, and code generation.
Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

**Responsibilities**

- Write product or system development code.
- Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
- Contribute to a compiler which scales-out machine learning models across accelerators such as Tensor Processing Unit (TPU)/Graphics Processing Unit (GPU) at Google and Cloud.
- Conduct static and runtime performance analysis of important large-scale production models.
- Design and implement performance optimizations and critical features, which increase the velocity of important production teams.

**Minimum qualifications:**

- Bachelor’s degree or equivalent practical experience.
- Candidates will typically have 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree.
- Candidates will typically have 2 years of experience with data structures or algorithms.

**Preferred qualifications:**

- Master's degree or PhD in Computer Science, or a related technical field.
- Experience in Machine Learning and High Performance Computing (HPC).
- Experience optimizing programs at distributed scale.
- Experience in C++.
- Experience in compilers.
- Ability to debug and program concurrent/parallel computations.

Pay range: N/A
Location: London, UK
Work site:

Software Engineer, Tensor Processing Units Compiler

**About the job**

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
Our team builds the compiler which enables Tensor Processing Units (TPUs), Google's in-house custom designed processor, to accelerate machine learning and other scientific computing workloads for both internal Google customers and external Cloud customers. The team offers opportunities up and down the compiler stack, working on Low Level Virtual Machine (LLVM) as well as the Multi-Level Intermediate Representation (MLIR) middle-end.
In this role, you'll be working on the MLIR/LLVM based TPU compiler for TPUs. You will support new workloads, optimize for new models and new characteristics, as well as support new TPU hardware across multiple generations.
Google Cloud accelerates organizations’ ability to digitally transform their business with the best infrastructure, platform, industry solutions and expertise. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology – all on the cleanest cloud in the industry. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

**Responsibilities**

- Contribute to a compiler for a novel processor designed to accelerate machine learning workloads. Compile high-performance implementations of operations at a distributed scale.
- Work closely with users of TPUs to improve performance/efficiency and hardware designers to co-design future processors.
- Investigate high-level representations to effectively program large-scale, distributed, and hetereogeneous systems.

**Minimum qualifications:**

- Bachelor’s degree or equivalent practical experience.
- 3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.
- 2 years of experience working with CUDA C++ application development and 1 year of experience with Native Code, Just-In-Time (JIT), Cross, Source-to-Source or any other type of compilers.
- 2 years of experience with data structures or algorithms, with experience with machine learning algorithms and tools (e.g. TensorFlow), artificial intelligence, deep learning, or natural language processing.

**Preferred qualifications:**

- Master's degree or PhD in Computer Science or related technical fields.
- Experience with performance, large-scale systems data analysis, visualization tools, or debugging.
- Experience with debugging correctness and performance issues at all levels of the stack.
- Experience with optimizations in mid-level and low-level architecture.
- Experience with hardware/software co-design.
- Experience in GPU integrating low-level CUDA work into higher-level frameworks (e.g., TF, JAX, PyTorch).

Pay range: N/A
Location: London, UK
Work site: N/A

Software Engineer III, OpenXLA

**About the job**

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
In this role, you will aim to make OpenXLA the best place for Machine Learning (ML) acceleration innovation. OpenXLA is an open-source ML compiler that powers Tensorflow, JAX and PyTorch/XLA. It accelerates ML models on Central Processing Units (CPU) and Graphics Processing Units (GPU). You will work on the stack (the hardware-independent bits), which includes StableHLO, HLO, XLA components and Application programming interface (API), and compiler tooling for faster model debugging. You will make OpenXLA the best place for ML accelerator and for vendors and compiler research, ensuring they can plug their hardware specific optimizations, code generation, and contributions into the OpenXLA stack.
The Core team builds the technical foundation behind Google’s flagship products. We are owners and advocates for the underlying design elements, developer platforms, product components, and infrastructure at Google. These are the essential building blocks for excellent, safe, and coherent experiences for our users and drive the pace of innovation for every developer. We look across Google’s products to build central solutions, break down technical barriers and strengthen existing systems. As the Core team, we have a mandate and a unique opportunity to impact important technical decisions across the company.

**Responsibilities**

- Write product or system development code. Participate in, or lead design reviews with peers and stakeholders to decide among available technologies.
- Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
- Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
- Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality.
- Contribute to the teams mission by learning and contributing to StableHLO, XLA and help in engaging with stakeholders who integrate with OpenXLA.

**Minimum qualifications:**

- Bachelor’s degree or equivalent practical experience.
- 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree in an industry setting.
- 2 years of experience with data structures or algorithms in either an academic or industry setting.
- 2 years of experience with Machine Learning (ML) algorithms and tools (e.g., TensorFlow), Artificial Intelligence (AI), deep learning or natural language processing.

**Preferred qualifications:**

- Master's degree or PhD in Computer Science, or a related technical field.
- 2 years of experience with performance, large scale systems data analysis, visualization tools, or debugging.
- Experience developing accessible technologies.
- Experience with compiler development and open source.
- Understanding of code and system health, diagnosis and resolution, and software test engineering.

Pay range: $136,000-$200,000 + bonus + equity + benefits for US base salary range
Location: Sunnyvale, CA, USA
Work site: N/A

Software Engineer, Edge TPU Developer Tools, Silicon

**About the job**

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
We are the team that builds Google Tensor, Google’s custom System-on-Chip (SoC) that powers the latest Pixel phones. Tensor makes transformative user experiences possible with the help of cutting-edge Machine Learning (ML) running on Tensor TPU. Our team’s work enables Gemini Nano, our efficient AI model for on-device tasks to run on Pixel phones. Our goal is to productize the latest ML innovations and research by delivering computing hardware and software.
In this role, you will work as part of the EdgeTPU compiler team. You will design and implement tools to reason about the correctness and performance of ML programs at multiple levels of abstraction. Additionally, you will analyze and improve the compiler quality and performance on optimization decisions, correctness and compilation time.
Google's mission is to organize the world's information and make it universally accessible and useful. Our team combines the best of Google AI, Software, and Hardware to create radically helpful experiences. We research, design, and develop new technologies and hardware to make computing faster, seamless, and more powerful. We aim to make people's lives better through technology.

**Responsibilities**

- Design and implement tools for bug detection, isolation, reproducer generation, and correction.
- Build tools that use and complement the compiler infrastructure to efficiently map ML models to the hardware.
- Develop parallelization and scheduling algorithms to optimize compute and data movement costs to execute ML workloads on the EdgeTPU.
- Design and implement new ways to gather useful performance and debugging data and relate them to the ML graph.
- Collaborate with ML model developers, researchers, and EdgeTPU hardware/software teams to accelerate the transition from research ideas to user experiences running on the EdgeTPU.

**Minimum qualifications:**

- Bachelor’s degree or equivalent practical experience.
- 5 years of experience with software development in C++, and with data structures/algorithms.
- 3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.
- 3 years of experience with machine learning algorithms and tools (e.g., TensorFlow), artificial intelligence, deep learning or natural language processing.
- Experience with stakeholder management, project alignment/management, or cross-functional collaboration.

**Preferred qualifications:**

- Master's degree or PhD in Computer Science or related technical field.
- Experience with domain-specific compilers for machine learning.
- Experience in power and performance optimizations.
- Experience with low intrusiveness tooling concepts such as profiling, instrumentation, bug isolation, data race detection, out-of-bounds access detection, performance estimation, and tracking source-level information through compiler transformations.
- Understanding of hardware, especially hardware that provides a high-degree of parallelism.

Pay range: $161,000-$239,000 + bonus + equity + benefits for US base salary range
Location: Mountain View, CA, USA; Bellevue, WA, USA
Work site: N/A

Senior Software Engineer, Compilers, PyTorch for Alphabet

**About the job**

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
The Core ML organization is part of Google Cloud and drives ML excellence for Google. Core ML is responsible for creating a cohesive, well lit path for machine learning at Google. The organization is also responsible for developing ML infrastructure and execution around key ML efforts within Google.
PyTorch for Alphabet is a new and exciting area of investment for Core ML, which aims to bring one of the world’s most successful open source ML technologies to Google customers. In this role, you’ll partner closely with machine learning teams across Google, work with amazing open source technologies, and be a part of the formation of a dynamic new organization.
Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

**Responsibilities**

- Collaborate with the ML Frameworks team to democratize top open source ML technology.
- Participate in coding and design of low-level ML frameworks and compiler technologies.
- Directly engage with ML research and production teams across Alphabet.
- Participate and interact with the external PyTorch and open source ecosystems and communities.

**Minimum qualifications:**

- Bachelor’s degree or equivalent practical experience.
- 5 years of experience with software development in one or more programming languages, and with data structures/algorithms.
- 3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.
- Experience with compilers or ML compilers.
- Experience working on or maintaining PyTorch.

**Preferred qualifications:**
- Master’s degree or PhD in Engineering, Computer Science, a technical related field, or equivalent practical experience.
- 1 year of experience in a technical leadership role.
- Experience developing accessible technologies.

Pay range: $161,000-$239,000 + bonus + equity + benefits in U.S.
Location: Seattle, WA, USA; Sunnyvale, CA, USA
Work site: N/A

Microsoft

Senior Software Engineer AI Compilers

**Responsibilities**

- Design and develop AI software in C/C++, Python, and other languages.
- Implementing innovative new compiler features and optimization passes
- Developing code generation techniques for novel hardware platforms
- Optimizing AI workloads
- Designing new programming abstractions for AI
- Collaborating broadly across multiple disciplines from hardware architects to ML developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables.
- Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.

**Qualifications**
**Required Qualifications:**

- Bachelor’s degree in computer science, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python. OR equivalent experience.
- 3+ years’ experience with C++.
- 2+ years’ experience building compilers, using compiler frameworks like LLVM/MLIR, or optimizing AI/numerical workloads.

**Other Requirements:**

- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

**Preferred Qualifications:**

- 2+ years’ experience with Python.
- M.S. or Ph.D. in computer engineering or related fields, or equivalent industry experience.
- Experience using or developing Machine Learning training or inference software.
- A deep curiosity and interest about exploring new technologies.
- Effective cross-team collaboration skills and communication skills.

Pay range: $117,200 - $229,200 in the U.S. depending on the work location
Location: Redmond, Washington, United States
Work site: Up to 100% work from home

Software Engineer II AI Compilers

**Responsibilities**

- Design and develop AI software in C/C++, Python, and other languages.
- Implementing innovative new compiler features and optimization passes
- Developing code generation techniques for novel hardware platforms
- Optimizing AI workloads
- Designing new programming abstractions for AI
- Collaborating broadly across multiple disciplines from hardware architects to ML developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables.
- Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.

**Qualifications**
**Required Qualifications:**

- Bachelor's Degree in Computer Science, or related technical discipline AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python
- OR equivalent experience.
- 2+ years experience with C++.

**Other Requirements:**

- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

**Preferred Qualifications:**

- 1+ years experience with Python
- Experience or interest in building compilers, compiler optimizations, or using compiler frameworks like LLVM or MLIR
- Experience implementing and optimizing AI workloads or other compute-intensive workloads.
- Experience using or developing Machine Learning training or inference software
- Continued intellectual curiosity and an interest in learning new technologies.
- Effective cross-team collaboration skills and communication skills.

Pay range: $98,300 - $193,200 in the U.S. depending on the work location
Location: Redmond, Washington, United States
Work site: Up to 50% work from home

Software Engineer II AI Compilers

**Responsibilities**

- Design and develop AI software in C/C++, Python, and other languages.
- Implementing innovative new compiler features and optimization passes
- Developing code generation techniques for novel hardware platforms
- Optimizing AI workloads
- Designing new programming abstractions for AI
- Collaborating broadly across multiple disciplines from hardware architects to ML developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables.
- Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.

**Qualifications**
**Required Qualifications:**

- Bachelor's Degree in Computer Science, or related technical discipline AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, or Python
- OR equivalent experience.
- 2+ years experience with C++.

**Other Requirements:**

- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

**Preferred Qualifications:**

- 1+ years experience with Python
- Experience or interest in building compilers, compiler optimizations, or using compiler frameworks like LLVM or MLIR
- Experience implementing and optimizing AI workloads or other compute-intensive workloads.
- Experience using or developing Machine Learning training or inference software
- Continued intellectual curiosity and an interest in learning new technologies.
- Effective cross-team collaboration skills and communication skills.

Pay range: $98,300 - $193,200 in the U.S. depending on the work location
Location: Redmond, Washington, United States
Work site: Up to 100% work from home

Software Engineer II AI Compilers

**Responsibilities**

- Design and develop AI software in C/C++, Python, and other languages.
- Implementing innovative new compiler features and optimization passes
- Developing code generation techniques for novel hardware platforms
- Optimizing AI workloads
- Designing new programming abstractions for AI
- Collaborating broadly across multiple disciplines from hardware architects to ML developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables.
- Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.

**Qualifications**
**Required Qualifications:**

- Bachelor's Degree in Computer Science, or related technical discipline AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, or Python
- OR equivalent experience.
- 2+ years experience with C++.

**Other Requirements:**

- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

**Preferred Qualifications:**

- 1+ years experience with Python
- Experience or interest in building compilers, compiler optimizations, or using compiler frameworks like LLVM or MLIR
- Experience implementing and optimizing AI workloads or other compute-intensive workloads.
- Experience using or developing Machine Learning training or inference software
- Continued intellectual curiosity and an interest in learning new technologies.
- Effective cross-team collaboration skills and communication skills.

Pay range: CAD $83,600 - CAD $159,600 in Canada. depending on the work location
Location: Vancouver, British Columbia, Canada
Work site: Up to 100% work from home

Senior Software Engineer AI Compilers

**Responsibilities**

- Design and develop AI software in C/C++, Python, and other languages.
- Implementing innovative new compiler features and optimization passes
- Developing code generation techniques for novel hardware platforms
- Optimizing AI workloads
- Designing new programming abstractions for AI
- Collaborating broadly across multiple disciplines from hardware architects to ML developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables.
- Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.

**Qualifications**
**Required Qualifications:**

- Bachelor’s degree in computer science, or related technical discipline AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C#, Java, JavaScript, or Python OR equivalent experience.
- 3+ years’ experience with C++.
- 2+ years’ experience building compilers, using compiler frameworks like LLVM/MLIR, or optimizing AI/numerical workloads.

**Other Requirements:**

- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

**Preferred Qualifications:**

- 2+ years’ experience with Python.
- M.S. or Ph.D. in computer engineering or related fields, or equivalent industry experience.
- Experience using or developing Machine Learning training or inference software.
- A deep curiosity and interest about exploring new technologies.
- Effective cross-team collaboration skills and communication skills.

Pay range: CAD $108,100 - CAD $253,000 in Canada. depending on the work location
Location: Vancouver, British Columbia, Canada
Work site: Up to 100% work from home

Principal Software Engineer AI Compilers

**Responsibilities**

- Leading design and development of AI software in C/C++, Python, and other languages.
- Leading teams to implement innovative new compiler features and optimization passes.
- Developing code generation techniques for novel hardware platforms.
- Optimizing AI workloads.
- Designing new programming abstractions for AI.
- Collaborating broadly across multiple disciplines from hardware architects to ML developers.
- Identify requirements, scope solutions, estimate work, schedule deliverables.
- Help establish and drive the adoption of outstanding coding standards and patterns and help enhance our inclusive engineering culture.

**Qualifications**
**Required Qualifications:**

- Bachelor’s degree in computer science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python. OR equivalent experience.
- 6+ years’ experience with C++.
- 5+ years’ experience building compilers, using compiler frameworks like LLVM/MLIR, or optimizing AI/numerical workloads.

**Other Requirements:**

- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

**Preferred Qualifications:**

- 10+ years’ experience with C++.
- 6+ years’ experience with Python.
- M.S. or Ph.D. in computer engineering or related fields, or equivalent industry experience.
- Experience using or developing Machine Learning training or inference software.
- A deep curiosity and interest about exploring new technologies.
- Effective cross-team collaboration skills and communication skills.

Pay range: $137,600 - $267,000in the U.S. depending on the work location
Location: Redmond, Washington, United States
Work site: Up to 100% work from home

NVIDIA

Deep Learning Compiler Engineer - MLIR - Job posted on STEMCareers.com

We are hiring deep learning compiler engineers for our MLIR GPU compiler team. NVIDIA GPUs are at the center of the deep learning revolution and continue to enable breakthroughs in generative AI, large language models, recommendation systems, speech recognition, image classification and other areas. Join us to work with a top-notch team and have broad impact across the entire deep learning community.

**What you'll be doing:**

In this role, you will be responsible for analyzing deep learning networks and developing compiler optimization algorithms. You’ll collaborate with members of the deep learning software framework teams and the hardware architecture teams to accelerate the next generation of deep learning software. The scope of these efforts includes defining public APIs, crafting, and implementing compiler and optimization techniques, performance optimization, and other general software engineering work.

**What we need to see:**

- Bachelors, Masters or Ph.D. in Computer Science, Computer Engineering or a related field (or equivalent experience)
- 3+ years of relevant work or research experience in compiler optimization, performance analysis and IR design.
- Ability to work independently, define project goals and scope, and lead your own development effort.
- Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
- Strong interpersonal skills are required along with the ability to work in a dynamic product-oriented team.

**Ways to stand out from the crowd:**

- Knowledge of CPU and/or GPU architecture. CUDA or OpenCL programming experience.
- Experience with the following technologies: MLIR, LLVM, XLA, TVM, deep learning models and algorithms, and deep learning framework design.

Pay range: 148,000 USD - 276,000 USD
Location: Santa Clara, CA
Work site: N/A

ML Compiler Job Position (Oct 19, 2024)

Job Openings

Qualifications

Reddit post

Appendix

Tesla

Rivian Automotive

Waymo

Gensyn

AMD

Ampere Computing

dMatrix

Groq

NextSilicon

Qualcomm

Renesas

Samsung Semiconductor

SiMa.ai

Synopsys

Tenstorrent

Untether AI

Lightning AI

OpenAI

Amazon

Google

Meta

Microsoft

NVIDIA