AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. WHY JOIN US If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you! ABOUT THE ROLE We are looking for a SRE Operations Engineer to maintain reliability across a cloud-based SaaS platform. You’ll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands‑on, execution‑focused, with real ownership across CI/CD pipelines, GitOps workflows, and on‑call rotations. WHAT YOU WILL DO Monitor and support production and staging environments to ensure availability, performance, and stability; Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts; Participate in on‑call rotations with defined SLAs; Handle operational requests from internal teams; Maintain and improve monitoring, alerting, dashboards, logs, and metrics; Support CI/CD pipelines, production releases, and GitOps workflows; Contribute to automation initiatives to reduce operational overhead; Maintain and improve Kubernetes‑based infrastructure and containerized workloads; Support Infrastructure as Code practices and environment improvements. MUST HAVES 2+ years of experience in Site Reliability Engineering, DevOps, or Production Operations; Experience with AWS supporting production environments; Experience supporting production SaaS applications; Strong understanding of CI/CD systems (GitHub Actions, Jenkins, CircleCI); Experience with GitOps and Git fundamentals; Experience using GitHub, Jira, and Confluence; Experience with Kubernetes (EKS, kOps or similar); Experience with Docker and containerization; Experience with observability tools (Grafana, Prometheus, Loki, PagerDuty); Proficiency in scripting ( Bash, Python, or Go ); Experience with Infrastructure as Code (Terraform, Helm); Ability to work within structured operational processes and SLAs; Strong written and verbal English communication skills; Self‑driven with a growth mindset. NICE TO HAVES AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator; Experience with multi‑tenant SaaS environments; Experience working in globally distributed teams; Familiarity with ChatOps practices; Experience improving monitoring quality and reducing alert fatigue. PERKS AND BENEFITS Professional growth: Mentorship, TechTalks, and personalized growth roadmaps. Competitive compensation: USD‑based pay with education, fitness, and team activity budgets. Exciting projects: Modern solutions with Fortune 500 and top product companies. Flextime: Flexible schedule with remote and office options. #J-18808-Ljbffr
Site Reliability Engineer Id53670
AGILEENGINE
puebla de zaragoza, puebla de zaragoza
Publicado hace 16 días
Denunciar empleo