THE ROLE:
Quickly maturing startup seeking like-minded Site Reliability Engineer! The technical team is a small, talented, and close-knit group and we need some development and systems help to make business and development operations flow smoothly.
As a well-rounded site reliability engineer, you should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level!
WHAT YOU’LL BE DOING:
- Troubleshoot issues along with developers, providing systems level and architecture insight to the current issue.
- Extend configuration management systems with new features and assist developers in bringing new services &software to the appropriate devices.
- Work autonomously to solve complex or unintuitive system stability issues.
- Research, investigate, and provide justification for new technologies that would benefit development and systems.
WHAT YOU BRING:
As a well-rounded system engineer and scripter, with a diverse set of skills, this makes you one of the very best people to troubleshoot, monitor the platform, and be on top of releases. You should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level!
- Experience working in an environment leveraging remote communication collaboration tools like slack, zoom etc. across multiple time zones
- Experience with git in a multi-contributor/team environment
- High degree of drive to improve and automate your environment with minimal guidance
- Be able to solve for immediate, and plan to accommodate for future problems
- Experience in automating tasks through scripting. You should be able to use Python and be familiar with a variety of packages.
- Extensive Ubuntu and systemd knowledge
- Extensive experience with a message queue system like RabbitMQ or Kafka
- Experience with time-series data stores
- Experience with Ansible, Salt, Terraform, Chef, Puppet, or CFEngine. Experience with Ansible and Terraform preferred
- Experience with build pipelines, integration testing, Jenkins, and github actions
- Experience administering a wide variety of *nix platforms, including multiple Linux variants
- Experience with Docker and Kubernetes
- Solid understanding of web protocols such as HTTP, TLS, HTTP/2, Server send events, CDN
- Solid understanding of nginx and SSL
Preferred Experience
- Familiarity with Arista/Cisco/Juniper/Nokia platforms.
- Experience with extremely large scale network management and monitoring.
- Experience with Postgres and grafana
- Experience with cloud platforms (public and/or self-hosted)
- Experience in PXE based deployments