Senior Site Reliability Engineer (US Remote based)

Posted 3 years ago

My client is a VC backed company who has a SaaS based solutions that is changing the way life science companies bring their product onto market. They are going through a phase of high growth and are looking to bring on board a talented Senior Site Reliability Engineer to help build and scale our growing SaaS platform.

As a member of their Product Engineering Team, you'll be a resident expert on their cloud-based hosting environment working as part of a cross-functional team designing, building, deploying, and running the product. You will be responsible for reliability, security, and compliance.

The platform was designed as cloud-native from the outset and takes advantage of today’s

current technologies including Docker, Kubernetes, AWS (EKS, S3, Aurora RDS, IAM, KMS,

auto-scaling groups, &load balancers), NGINX, MongoDB, Kafka, APM instrumentation, and an

automated CI/CD pipeline. Everything runs on Linux. Many of the tools are built using Bash and

Python.

The role responsibilities

Develop, test, and deploy secure, reliable, highly available, and scalable infrastructure using AWS technologies with Terraform or AWS CDK
Navigate application and infrastructure stack fluently
Build, configure, and maintain CI/CD pipelines to optimize application build, test, deployment, and operations
Collaborate and work closely with technology and product teams
Develop, deploy, and maintain tools to drive efficiency, repeatability, security, and compliance
Define, measure, elevate, and improve operational key performance indicators (e.g. MTTR, lead time, change failure rate)
Participate in on-call rotations with engineering
Deploy production code and configuration updates
Implement a complete backup and disaster recovery strategy for all test and production systems
Help drive DevOps as an integral part of a strategic software team

Candidate requirements

You have at least 3 years of hands-on experience in Site Reliability, DevOps or similar roles
When it comes to infrastructure, you always think of automation first: You avoid manual changes in production at all costs
You live and breathe AWS Cloud: Services and acronyms such as EKS, S3, RDS urora, ELB, VPC, MSK, WAF, EBS, CloudFormation, and CloudWatch are part of your daily vocabulary and you can’t wait to read about the new services that could make your life even easier
You strive to be the first to know about production issues even before they occur, with carefully designed monitoring and strive to work with the team to resolve them before anyone else could notice
You practice and preach good security: you would never think of sending credentials over cleartext, you use two factor authentication whenever possible, credentials in code? never! and you make sure that latest security patches are always applied
You make sure there are always current backups of everything, and you know they are reliable because you routinely test restores
You are passionate about maintaining consistent staging and testing environments, because you wouldn’t dream of deploying something to production untested
You love working with developers and testers, and making their lives even easier by setting up super streamlined and reliable continuous integration and deployment systems
You can install and configure Linux in your sleep, and troubleshoot network issues just as easily

Apply