Senior Site Reliability Engineer - remote

UKG (Ultimate Kronos Group)

Posted 3 years ago

ContinuousIntegration Python Docker Kubernetes Puppet

Site Reliability Engineers at UKG Technology and Innovation are hybrid software/system engineers that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an “automate everything”mindset, helping our company deploy services with incredible speed, consistency and availability.

Primary/Essential Duties and Key Responsibilities:

Engage in and improve the whole lifecycle of services from conception, to inception, including: system design consulting, and capacity planning
Define and implement standards and best practices related to: System Architecture, Deployment, metrics, operational tasks
Support services through activities such as monitoring availability, system health, and incident response
Improve system performance, application delivery and efficiency through, automation, process refinement, post mortem reviews, and in-depth configuration analysis
Engage in Communications across all areas of the organization

Required Qualifications:

Engineering degree, or a related technical discipline, or equivalent work experience
Experience with Cloud based applications
Experience with Containerization Technologies
Experience with Microsoft and Linux Technologies
Experience with VMWare or other Virtual Server Software
Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java)
Experience in configuration and maintenance of applications such as web servers, load balancers, relational databases, storage systems and messaging systems
Experience with MongoDB, MySQL, ElasticSearch, RabbitMQ, and others
Experience with operating systems and TCP/IP network fundamentals
Experience learning software, frameworks and APIs
Ability and willingness to work evenings / nights on occasion.
Ability to lead and work in projects
Experience as a Site Reliability Engineer, Production Engineer, or equivalent
Experience with distributed system design and architecture
Experience building and managing CI/CD Pipelines
Experience with public or private cloud platforms (e.g. GCP, Kubernetes, or Openstack)
Experience with Production level monitoring and alerting with tools like Prometheus, Grafana, Datadog, etc.

Travel Requirements:

0-5%

Apply