Senior Site Reliability Engineer - remote

RevenueCat

Posted 3 years ago

About Us:

At RevenueCat, we make selling subscriptions in your mobile app easy. We launched as part of Y Combinator's summer 2018 batch and today are handling subscriptions for more than 10 million mobile subscriptions across thousands of apps. We are a mission driven, remote-first company that is building the foundation of mobile subscription infrastructure. Top companies like VSCO, Notion, WidgetSmith, Buffer, and Fishbrain count on RevenueCat to power their subscriptions at scale.

Our 30 team members (and growing!) are located all over the world, from San Francisco to Madrid to Taipei, and we're proud to be a remote-first company. We're a close-knit, product-driven team, and we love our core values: Always be Shipping, Own it, Be Customer-Obsessed, and Be Balanced.

This person will be the first member of RevenueCat's Reliability Engineering team. Up until now, reliability efforts have been performed by all the product engineers. It's time to start a team solely focused on this area as reliability is paramount to RevenueCat.

We want to bring somebody onboard that is passionate about reliability, scalability and understanding the limits of computers and people. We need somebody that will help the rest of the product engineers to learn reliability best practices and processes. This person should be excited about all the technical challenges we will face growing our API throughput from 400K requests per minute to millions of requests per minute.

About You:

You have 5+ years of experience as a Software or Platform Engineer and are comfortable writing and analyzing code.
You understand data structures, can investigate incidents, and differentiate memory, I/O and CPU bottlenecks.
You have experience designing, maintaining and rolling out large and growing distributed systems.
You are extremely curious and excited about finding out how many more requests we can handle without any downtime.
You hate manual processes and love to automate all the things and reduce toil.

Preferred but Not Required:

Experience building and maintaining systems to monitor and improve availability and scalability
Experience with a container orchestration system (Kubernetes, AWS ECS, Nomad,...)
Great communication skills and eager to educate the team about best reliability practices
Experience with AWS, Terraform and PostgreSQL
Experience with highly available, high throughput, REST APIs

In the first month, you'll:

Work with the CTO to learn about our current infrastructure and its evolution
Work with our product engineers to learn about the new product efforts and their infrastructure needs
Learn about our product, API, database and what is computationally cheap vs expensive
Learn about our current practices, alarms, monitoring tools and on-call rotations

In the first three months, you'll:

Detect our current bottlenecks, risks and single points of failure
Own and tweak our alarms to guarantee proper noise/signal ratio
Own blameless post-mortem analysis and action items coordination
Manage the on-call rotation
Help define SLOs

In the first six months, you'll:

Own risk assessment, disaster planning and response strategies
Be obsessed about our uptime
Detect our blindspots and add observability
Work closely with product engineers to design reliable rollouts of new features. You will contribute to writing and reviewing code as well as participating in architectural discussions.

Within a year, you'll:

Be the most knowledgable person in the company about our infrastructure, and the main advocate of building a culture of security and reliability
Help recruit and build our SRE team
Educate the whole team about best practices and onboard new engineers to on-call rotation
Be involved in the process of building new product features, from the design to rollout, maintenance and scaling

What We Offer:

$150,000-$170,000 USD + competitive equity across all geographies
Generous stipend for home workspace
Comprehensive medical, dental, and vision coverage for US team members
Matched 401K plans for US team members
Open vacation policy

Apply