Team: Advanced Engineering / High-Performance Computing (HPC) LUXOFT ONLY OPERATES UNDER CLT CONTRACT FULL TIME POSITION About the Project Join an elite engineering team working directly on GPU support for OpenAI/Triton —the cutting-edge language and compiler designed for writing highly efficient, custom Deep Learning primitives. In this role, you will collaborate closely with the global open-source community to analyze, develop, test, and deploy massive performance improvements for neural networks implemented with Triton on AMD GPUs utilizing the ROCm ecosystem . If you are passionate about low-level hardware-software co-design, deep learning acceleration, and open-source systems engineering, this is the place to make a massive industry impact! Compiler Development & Optimization: Architect new features, implement optimization passes, and support the OpenAI/Triton project codebase targeting next-generation GPU hardware. Community & Open-Source Collaboration: Engage as a core contributor within the open-source community, aligning engineering updates with project maintainers, internal stakeholders, and product managers. Performance Engineering: Profile and analyze neural network primitives (like GEMM) to identify microarchitectural bottlenecks and squeeze out maximum hardware performance. Quality & Architecture Rigor: Design and execute robust unit, component, and functional testing strategies, ensuring code verification and maintaining world-class technical documentation. Required Qualifications (Mandatory) Core Systems Languages: Exceptional programming mastery in C and C++ , alongside basic script development proficiency in Python . Compiler Engineering: Solid, hands‑on experience working directly with compiler internals (such as LLVM , GCC , or proprietary compiler architectures). Performance Diagnostics: Strong foundation in performance analysis, hardware profiling, and diagnosing microarchitectural bottlenecks. Deep familiarity with modern compiler infrastructures like LLVM and MLIR (Meta-Left Intermediate Representation), including custom optimization pass implementation. Solid experience with build systems like CMake , Make , or Ninja . GPGPU Computing & Hardware Acceleration Direct experience writing or optimizing code for GPGPU architectures via HIP , CUDA , or OpenCL . Foundational understanding of GEMM (General Matrix Multiply) execution paradigms and performance tuning at the register/cache level. Knowledge of the AMD ROCm infrastructure and runtime stack. Basic conceptual understanding of Machine Learning/Deep Learning topologies. Practical hands‑on experience using or testing with PyTorch . If you are ready to push the boundaries of AI hardware acceleration and contribute to the core tools shaping the future of deep learning compilers, apply today! #J-18808-Ljbffr

Senior Compiler Engineer

LUXOFT

Empleos similares

Construcción De Invernadero Para Flores O Planta Ornamental

CRONOSHARE.COM.MX

Cotizaciones Para Construir Un Sótano Bajo Una Vivienda O Edificio

CRONOSHARE.COM.MX

Construcción De Invernadero Para Agricultura

CRONOSHARE.COM.MX

Construcción De Invernadero Para Agricultura

CRONOSHARE.COM.MX

Cotizaciones Para Construir Una Casa Con Sótano

CRONOSHARE.COM.MX

Construcción De Invernadero Para Rosas

CRONOSHARE.COM.MX

Construcción De Invernadero Para Agricultura

CRONOSHARE.COM.MX

Recibe empleos similares por e-mail