Site Reliability Engineer - remote

Magnolia International Ltd.

Posted 1 year ago

Our Site Reliability Engineers ensure a reliable and smooth interaction of the various systems Magnolia and our customers use every day. SREs apply software engineering approaches to infrastructural and operational challenges - the ultimate goal is to develop and maintain highly-exposed, fault-tolerant systems in an ever-growing and complex IT landscape. We continuously innovate to make our product more reliable, robust and fault-tolerant to fulfill our business-critical mission. SRE team plays a key role in Magnolia’s mission to further evolve its product in a world of automation and microservices.

This is a full-time, remote role. We are primarily looking for candidates based in the USA (East Coast) and/or Europe.

What things you'll be doing?

You will work with SRE practitioners at the heart of our production environment
You will be an internal advocate of DevOps culture and cloud technologies at Magnolia
You will monitor service performance and ensure its smooth operation
You will leverage your experience and knowledge of complex customer-facing systems to improve and automate existing tools, processes, and infrastructure
You will be involved in the design, build, and deployment of scalable and secure services and provide architectural reviews, vulnerability testing, security reviews, and availability and reliability assessments
You will participate in an on-call schedule

What do we need from you?

BA/BS in Computer Science or related technical field, or equivalent practical experience
Experience with Cloud infrastructure - preferably AWS and Kubernetes
Experience with Infrastructure as Code: Terraform/Terratest, Configuration Management tools (eg. Ansible)
Experience with container technologies (Docker)
Experience with monitoring tools (eg. Datadog, Prometheus, CloudWatch)
Experience with Kubernetes Operators is a plus
Experience with a flair for automation: CI/CD pipelines, testing, automated releases and deployment strategies
Programming experience with one or more relevant languages: Java, Go, Python, Bash, JavaScript
Ability to analyze and troubleshoot large-scale distributed systems in production
Experience in operating, optimizing, and scaling SQL databases, Kubernetes clusters and/or Datadog
Familiarity with security and operation standards as well as best practices around personal data
Solid communication skills

We are for you if you like to:

take charge: You are in the driver’s seat and set the direction according to what customers, colleagues and cultures need. No matter the roadblocks you see ahead, you take charge in (re)shaping the destination.
connect: You never drive alone. Building meaningful connections means creating experiences together that form a foundation of trust so next time there’s a bump in the road, you know someone else has your back.
be you: Choose your own ways and means. You make every perspective count so that everyone feels safe enough to follow their purpose and at the same time pursue one common goal. Your way of growing is to mutually question yourself and others.

Apply