Senior Site Reliability Engineer - remote

Gorgias

Posted 3 years ago

Kubernetes Postgresql GoogleCloudPlatform Rabbitmq Python

Gorgias helps ecommerce companies transform support from painful to exceptional. Our product creates a unified profile of customers by combining emails, live-chat, and social-media messages with ecommerce data such as purchase and delivery info. Combining all this data in a single application makes customer service more efficient and just better. Another fortunate side-effect is that some requests are completely automated using ML. + = We've been around since 2015, and we're currently serving over 5000+ ecommerce businesses, including Timbuk2, Steve Madden, and MVMT.

Join us! Resistance is futile!

You'll fit right in if you care about working on applications that are putting the customer needs first. At Gorgias, we work hard, so our customers have an easier job! We also like working with people that "get stuff done" and are confident about pushing their code to production many times a day. You're also one of those people that cares about becoming a better engineer and human every day. Learning never stops;we help each other get better!
If getting your hands dirty in a real-world application that touches the lives of millions is your thing, then yeah, Gorgias is for you.

What are some of the things we work on?

As site reliability engineers, we're continually improving monitoring and alerting using Stackdriver, Datadog, and Sentry. When we have proper monitoring set in place, we can make clear decisions about how to make sure that our infrastructure runs smoothly for our customers on Google Cloud Platform. We also treat the infrastructure as code and are provisioning it using Terraform and Helm3. We also work a lot with Postgres, RabbitMQ, Google PubSub, and Kubernetes.
We are currently facing typical fast growth challenges: number of HTTP requests grew by 200% in the last 6 months, our main database has more than 10TB+ of SQL data, our P99 response time is suboptimal, etc. So every month we need to balance between rearchitecture, optimization, configuration change, or just more/bigger servers.

Who are we at work?

We manage multiple k8s clusters in several regions, with around a hundred nodes and many Tb of database storage, that is why we have a strong preference for people who managed high-traffic web applications for the past 3+ years.
Also, because our apps have over 5000 daily active users with sessions longer than 6h/day we put great importance on quality, testing, and code-review - yes, that includes infrastructure code. Sometimes however we gotta go lower level and work directly with VMs, firewalls, load balancers and have an excellent understanding of Linux and containers.
If this is the type of environment you're looking for, then you should consider applying.

About You

6-10 years of work experience
Programming experience, preferably in Python (a good 3rd of your time will be spent reading python code to investigate performance problems and bottlenecks but you won't be writing the code)
You've built an infrastructure from scratch at some point in your career
Extensive experience working with modern SRE stack (Kubernetes, DataDog, PubSub, etc.)
You are looking to be an individual contributor, managing or leading a group of engineers isn't your favorite thing
You like to bounce off fresh ideas with other engineers and SRE's on how to better automate things

Nice to Have

You have startup/small size company experience
How you became an SRE is a story worth telling, and we'd love to hear it

Perks &Benefits

Competitive salary
Equity package
Health coverage
Retirement benefits
4-week vacation
Parental leave
Included the latest MacBook Pro or equivalent
Catered lunch 5x per week

More about Gorgias

Why You Should Join Us

Join a high growth tech startup at a crucial time, and with an unusually technical growth team
Work at the core of our most valuable tool: our growth "machine" (that is discussed at Growth conferences all over the world!)
Apply your engineering skills to concrete business problems, and have an impact on all stages of our business model (from Marketing to Success)
Join a company where automation, good &clean data are core beliefs shared by all

Engineering Team Culture Getting Stuff Done, Ownership, Team Work, Excellence, and Agility. You should join us if you want to ship stuff fast without scarifying quality. We've put great importance on testing our code, cleaning it, treating errors first, and features later. We also value growth and ownership. People make mistakes. We learn from them to avoid them in the future. We cannot achieve excellence if there are no bumps in the road.

Apply