Site Reliability Engineer - remote

Flexera

Posted 4 years ago

Kubernetes Debugging Terraform Architecture Agile Devops Security Communication Infrastructure Cloud Aws Sre Design Product System Distributed Reliability

Companies spend millions on IT, year after year, without a clear picture of where it goes or how effectively it’s spent. Flexera shines a light into this technology black hole so executives can understand exactly what’s in their IT ecosystem and manage it more effectively.

Flexera thrives on building a diverse workforce because it enables us to build our products with a wider appeal and serve our customers on a global scale.

Flexera is looking for an experienced Site Reliability Engineer to join our team. The SRE team works with product development to define our Service Level Objectives and performs the work required to ensure we meet those SLOs. These teams employ agile and lean principles in a culture of constant learning and improving.

We are seeking someone with extensive experience working on a SaaS/Cloud product with a micro-services architecture (Terraform and Kubernetes), in a DevOps culture, with a strong CI/CD approach.

Responsibilities:

Help to eliminate operational toil - seek to automate repetitive operations work
Establishing and enhancing CI/CD pipelines
Create dashboards with Grafana/Prometheus which help communicate the metrics for a given product service
Collaboration with other teams
Investigation, debugging and resolution of customer issues
Mentoring of team-members on cloud computing, infrastructure and best practices
Ensuring the security and reliability of shared Infrastructure with the Flexera cloud
Making Reliability a first-class citizen
Design, develop and deploy new features for Flexera products/platforms, as defined by goals from the SRE organization
Define and manage the processes and tools that facilitate the program’s goals
Mentor and guide team members on best practices, and recommended methods
Be a middleman between the development team and the SRE organization

Minimum Qualifications:

Computer Science degree, or related industry experience managing a mission critical production system for at least 2 years
Excellent written and verbal communication skills
Experience implementing fault detection, and automating fixes
Experience designing scalable services
Experience designing distributed, fault-tolerant systems
Experience with Micro-Services Architecture
Experience managing services in AWS with Terraform and Kubernetes

Apply