Site Reliability Engineer (Remote)

Booming Games Malta Ltd.
Posted 3 years ago

Our Systems &Infrastructure Department is growing and offers you the chance to contribute your skills from the start. We are looking for a reliable colleague who works independently, takes responsibility and feels comfortable in an environment with flat hierarchies without micro management.

Responsibilities

  • Daily interactions ensuring the health and maintenance of different geographical stacks: hardware, software, application and network are operating at peak performance.
  • Perform deep dives into both systemic and latent reliability issues;partner with software and systems engineers across the organization to produce and roll out fixes.
  • Troubleshoot issues across the entire stack: hardware, software, application and network.
  • Drive standardization efforts across multiple disciplines and services in conjunction with SREs throughout the organization.
  • Identify and drive opportunities to improve automation for the company;scope and create automation for deployment, management and visibility of our services.
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services.
  • Work with software engineers to improve upon deployment processes.
  • Participate in the on-call rotation for production systems.

Requirements

  • Sound fundamentals in operating systems, networking, and distributed systems.
  • Strong familiarity with Linux systems administration and management / best practices.
  • Familiarity with OS container technology: Kubernetes, CRI, namespaces/cgroups.
  • Strong understanding of: Ethernet, VLAN, IPv4/IPv6, ARP, DHCP, DNS, and TCP.
  • Familiarity with distributed system problems: leader election, consensus, etc.
  • Solid understanding of systems and application design, including the operational trade-offs of various designs.
  • Expert level understanding with at least one public or private cloud technology such as Amazon AWS or OpenStack.
  • Practical knowledge of various aspects of service design, including messaging protocols &behavior, caching strategies and software design practices.
  • Practical, intermediate knowledge of shell scripting, some ruby is a plus.
  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures.
  • Excellent knowledge of Linux/UNIX systems administration and performance tuning.
  • Comfortable configuring DNS, DHCP, and LAN/WAN technologies.
  • Minimum 5 years of managing services in an internet scale *nix environment.
  • Must be able to communicate well with technical as well as non-technical colleagues to achieve business goals.
  • Ability to prioritize tasks and work independently, must be able to work with multiple teams across multiple subjects.
  • Must be adaptable and able to focus on the simplest, most efficient &reliable solutions.
  • Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills.
  • Curiosity and an interest in networking, systems software, and/or distributed systems.
  • Experience as a systems administrator or operations engineer.
  • Experience with a 24/7 production environment, and you have deployed code to and/or managed deployments providing software, platforms, or infrastructure as a service.
  • Experience with Mellanox and Vyatta based networking gear is a plus.
  • Experience with Super Micro server and storage gear.