Our Site Reliability Engineers ensure a reliable and smooth interaction of the various systems Magnolia and our customers use every day. SREs apply software engineering approaches to infrastructural and operational challenges - the ultimate goal is to develop and maintain highly-exposed, fault-tolerant systems in an ever-growing and complex IT landscape. We continuously innovate to make our product more reliable, robust and fault-tolerant to fulfill our business-critical mission. SRE team plays a key role in Magnolia’s mission to further evolve its product in a world of automation and microservices.
This is a full-time, remote role. We are primarily looking for candidates based in the USA (East Coast) and/or Europe.
What things you'll be doing?
This is a full-time, remote role. We are primarily looking for candidates based in the USA (East Coast) and/or Europe.
What things you'll be doing?
- You will work with SRE practitioners at the heart of our production environment
- You will be an internal advocate of DevOps culture and cloud technologies at Magnolia
- You will monitor service performance and ensure its smooth operation
- You will leverage your experience and knowledge of complex customer-facing systems to improve and automate existing tools, processes, and infrastructure
- You will be involved in the design, build, and deployment of scalable and secure services and provide architectural reviews, vulnerability testing, security reviews, and availability and reliability assessments
- You will participate in an on-call schedule
What do we need from you?
- BA/BS in Computer Science or related technical field, or equivalent practical experience
- Experience with Cloud infrastructure - preferably AWS and Kubernetes
- Experience with Infrastructure as Code: Terraform/Terratest, Configuration Management tools (eg. Ansible)
- Experience with container technologies (Docker)
- Experience with monitoring tools (eg. Datadog, Prometheus, CloudWatch)
- Experience with Kubernetes Operators is a plus
- Experience with a flair for automation: CI/CD pipelines, testing, automated releases and deployment strategies
- Programming experience with one or more relevant languages: Java, Go, Python, Bash, JavaScript
- Ability to analyze and troubleshoot large-scale distributed systems in production
- Experience in operating, optimizing, and scaling SQL databases, Kubernetes clusters and/or Datadog
- Familiarity with security and operation standards as well as best practices around personal data
- Solid communication skills
We are for you if you like to:
- take charge: You are in the driver’s seat and set the direction according to what customers, colleagues and cultures need. No matter the roadblocks you see ahead, you take charge in (re)shaping the destination.
- connect: You never drive alone. Building meaningful connections means creating experiences together that form a foundation of trust so next time there’s a bump in the road, you know someone else has your back.
- be you: Choose your own ways and means. You make every perspective count so that everyone feels safe enough to follow their purpose and at the same time pursue one common goal. Your way of growing is to mutually question yourself and others.