NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Site Reliability Engineering (SRE) / Lead Engineer to join our team in Guadalajara, Jalisco (MX-JAL), Mexico (MX). Site Reliability Engineer (SRE) / Lead Engineer candidate will have deep expertise in Application Performance Monitoring (APM), Infrastructure as Code (IaC), automation, and distributed tracing using OpenTelemetry. As a SRE lead, you will guide the design, implementation, and continuous improvement of observability solutions, ensuring system reliability, performance, and scalability while fostering best practices in SRE and DevOps. Responsibilities Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements. Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices. Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments. Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency. Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies. Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements. Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices. Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships. Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence. Qualifications 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities. Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation. Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace. Strong proficiency in Infrastructure as Code (IaC) using Terraform. Solid understanding of cloud platforms including AWS, GCP, or Azure. Experience with automation/configuration management tools like Ansible, Chef, or Puppet. Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps. Experience managing Kubernetes and containerized environments (Docker, Helm). Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk. Excellent leadership, communication, and collaboration skills. NTT DATA endeavors to make accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at This contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here. #J-18808-Ljbffr
Site Reliability Engineering (Sre) / Lead Engineer
NTT DATA NORTH AMERICA
región centro, región centro
Publicado hace 17 días
Denunciar empleo