Team: Advanced Engineering / High-Performance Computing (HPC) LUXOFT ONLY OPERATES UNDER CLT CONTRACT FULL TIME POSITION About the Project Join an elite engineering team working directly on GPU support for OpenAI/Triton —the cutting-edge language and compiler designed for writing highly efficient, custom Deep Learning primitives. In this role, you will collaborate closely with the global open-source community to analyze, develop, test, and deploy massive performance improvements for neural networks implemented with Triton on AMD GPUs utilizing the ROCm ecosystem . If you are passionate about low-level hardware-software co-design, deep learning acceleration, and open-source systems engineering, this is the place to make a massive industry impact! Compiler Development & Optimization: Architect new features, implement optimization passes, and support the OpenAI/Triton project codebase targeting next-generation GPU hardware. Community & Open-Source Collaboration: Engage as a core contributor within the open-source community, aligning engineering updates with project maintainers, internal stakeholders, and product managers. Performance Engineering: Profile and analyze neural network primitives (like GEMM) to identify microarchitectural bottlenecks and squeeze out maximum hardware performance. Quality & Architecture Rigor: Design and execute robust unit, component, and functional testing strategies, ensuring code verification and maintaining world-class technical documentation. Required Qualifications (Mandatory) Core Systems Languages: Exceptional programming mastery in C and C++ , alongside basic script development proficiency in Python . Compiler Engineering: Solid, hands‑on experience working directly with compiler internals (such as LLVM , GCC , or proprietary compiler architectures). Performance Diagnostics: Strong foundation in performance analysis, hardware profiling, and diagnosing microarchitectural bottlenecks. Deep familiarity with modern compiler infrastructures like LLVM and MLIR (Meta-Left Intermediate Representation), including custom optimization pass implementation. Solid experience with build systems like CMake , Make , or Ninja . GPGPU Computing & Hardware Acceleration Direct experience writing or optimizing code for GPGPU architectures via HIP , CUDA , or OpenCL . Foundational understanding of GEMM (General Matrix Multiply) execution paradigms and performance tuning at the register/cache level. Knowledge of the AMD ROCm infrastructure and runtime stack. Basic conceptual understanding of Machine Learning/Deep Learning topologies. Practical hands‑on experience using or testing with PyTorch . If you are ready to push the boundaries of AI hardware acceleration and contribute to the core tools shaping the future of deep learning compilers, apply today! #J-18808-Ljbffr
Senior Compiler Engineer
LUXOFT
Región Centro, Región Centro
Publicado hace 4 días
Denunciar empleo