Senior Site Reliability Engineer in FL (remote flexibility)

Ultimate Software
Posted 4 years ago
Stack Overflow

Site Reliability Engineers at UKG Technology and Innovation are hybrid software/system engineers that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation. Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an “automate everything”mindset, helping our company deploy services with incredible speed, consistency and availability.

Primary/Essential Duties and Key Responsibilities:

  • Engage in and improve the whole lifecycle of services from conception, to inception, including: system design consulting, and capacity planning
  • Define and implement standards and best practices related to: System Architecture, Deployment, metrics, operational tasks
  • Support services through activities such as monitoring availability, system health, and incident response
  • Improve system performance, application delivery and efficiency through, automation, process refinement, post mortem reviews, and in-depth configuration analysis
  • Engage in Communications across all areas of the organization

Required Qualifications:

  • Engineering degree, or a related technical discipline, or equivalent work experience
  • Experience with Cloud based applications
  • Experience with Containerization Technologies
  • Experience with Microsoft and Linux Technologies
  • Experience with VMWare or other Virtual Server Software
  • Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java)
  • Experience in configuration and maintenance of applications such as web servers, load balancers, relational databases, storage systems and messaging systems
  • Experience with MongoDB, MySQL, ElasticSearch, RabbitMQ, and others
  • Experience with operating systems and TCP/IP network fundamentals
  • Experience learning software, frameworks and APIs
  • Ability and willingness to work evenings / nights on occasion.
  • Ability to lead and work in projects
  • Experience as a Site Reliability Engineer, Production Engineer, or equivalent
  • Experience with distributed system design and architecture
  • Experience building and managing CI/CD Pipelines
  • Experience with public or private cloud platforms (e.g. GCP, Kubernetes, or Openstack)
  • Experience with Production level monitoring and alerting with tools like Prometheus, Grafana, Datadog, etc.

Travel Requirements:

  • 0-5%