Site Reliability Engineer - remote

Flexera
Posted 4 years ago
Companies spend millions on IT, year after year, without a clear picture of where it goes or how effectively it’s spent. Flexera shines a light into this technology black hole so executives can understand exactly what’s in their IT ecosystem and manage it more effectively.
 
Flexera thrives on building a diverse workforce because it enables us to build our products with a wider appeal and serve our customers on a global scale.
 
Flexera is looking for an experienced Site Reliability Engineer to join our team. The SRE team works with product development to define our Service Level Objectives and performs the work required to ensure we meet those SLOs. These teams employ agile and lean principles in a culture of constant learning and improving.
 
We are seeking someone with extensive experience working on a SaaS/Cloud product with a micro-services architecture (Terraform and Kubernetes), in a DevOps culture, with a strong CI/CD approach.
 
Responsibilities:
  • Help to eliminate operational toil - seek to automate repetitive operations work
  • Establishing and enhancing CI/CD pipelines
  • Create dashboards with Grafana/Prometheus which help communicate the metrics for a given product service
  • Collaboration with other teams
  • Investigation, debugging and resolution of customer issues
  • Mentoring of team-members on cloud computing, infrastructure and best practices
  • Ensuring the security and reliability of shared Infrastructure with the Flexera cloud
  • Making Reliability a first-class citizen
  • Design, develop and deploy new features for Flexera products/platforms, as defined by goals from the SRE organization
  • Define and manage the processes and tools that facilitate the program’s goals
  • Mentor and guide team members on best practices, and recommended methods
  • Be a middleman between the development team and the SRE organization
 
Minimum Qualifications:
  • Computer Science degree, or related industry experience managing a mission critical production system for at least 2 years
  • Excellent written and verbal communication skills
  • Experience implementing fault detection, and automating fixes
  • Experience designing scalable services
  • Experience designing distributed, fault-tolerant systems
  • Experience with Micro-Services Architecture
  • Experience managing services in AWS with Terraform and Kubernetes