Design, implement, and maintain scalable infrastructure using Linux and Kubernetes. Monitor system performance using Prometheus and address potential issues proactively. Automate operational processes to improve system reliability and efficiency. Respond to incidents, perform root cause analysis, and implement improvements. Collaborate with development teams to ensure smooth deployments and high availability. Create and maintain documentation, runbooks, and operational guidelines. Promote best practices in reliability, security, and system performance. Requirements Strong experience with Linux system administration and troubleshooting. Strong expertise in Kubernetes cluster management and orchestration. Strong experience using Prometheus for monitoring and alerting. Proficiency in scripting languages such as Bash or Python. Strong problem-solving and incident management skills. Excellent written and verbal communication skills. Ability to work independently in a remote, fast-paced environment. #J-18808-Ljbffr

Site Reliability Engineer | Remote

CROSSING HURDLES

Empleos similares

Profesor/A Tiempo Completo Fisioterapia

UNIVERSIDAD DEL VALLE DE MÉXICO

Tecnico De Moldes De Inyeccion

GATE | GLOBAL AUTOMOTIVE TOOLING & EQUIPMENT

Representante Del Servicio De Atención Al Cliente

MONIFIC

Encargado De Contabilidad

R OLIVARES Y ASOCIADOS SC

Project Manager Senior - Remote

INDI STAFFING SERVICES

Gerente De Mantenimiento

OWENS CORNING

Auxiliar Contable

PSINTEGRA

Recibe empleos similares por e-mail