Site Reliability Engineer - remote

Petal
Posted 4 years ago
We Work Remotely
The Infrastructure team

The infrastructure team is interested in building strong foundations for the rest of Petal to pave paths of success upon. The tools Infrastructure uses are at the forefront of industry practices and community driven technology.  We’re passionate about scalability, reliability, and simplicity, but most of all we’re interested in the empowerment of the company as a whole.

The Infrastructure Software Engineer role

Infrastructure engineers will be the glue that binds our engineering teams together. Many teams rely upon our foresight and expertise in order for the bigger picture to come together. We foster an environment where engineers can reach out for any question, not just related to infrastructure. The position is best for curious, generalist programmers who are deeply familiar with web application infrastructure and love to apply software engineering principles to make their and everyone else’s lives easier. 

Here is our current tech stack: https://stackshare.io/petal

Key responsibilities

  • Be responsible for the overall health and performance of Petal’s underlying infrastructure.
  • Participate in the optimization of the entire lifecycle of services - deployment, scaling, monitoring, and optimization.
  • Know standard security practices and identify any potential infrastructure-specific vulnerabilities.
  • Write code. We want engineers who can automate the deployment, administration, and monitoring of our container-centric Linux environments. We’re strong believers in writing code to solve mundane problems.
  • Gain deep application-level knowledge of our systems and contribute to their overall design.
  • Work with development teams to enhance, document, and establish processes and generally improve the operability and security of our systems.
  • Improve automation of operational processes (provisioning, replication, deployments, continuous integration).
  • Bring monitoring, alerting, and observability for production and nonproduction issues to the next level
  • Answer questions about our tooling with kindness and compassion. We help others understand the work we do and how they can benefit from using it.

Characteristics of a successful candidate

  • At least 3 years of DevOps or site reliability engineering experience. Bonus points for experience in a rapidly growing tech startup.
  • Familiarity with open source. We use, learn from, and contribute to many open source products. Familiarity with concepts and principles that are popular throughout open source is a useful skill.
  • Capable programmer. Infrastructure remains nimble (and sane) by putting automation and software at the forefront of everything we do. We’re looking for candidates whose main tools include the ability to think and act from a programmatic mindset, and who has: the ability to recognize the need to automate, when duplication has become burdensome, how to keep things simple, and when it's appropriate to write code.
  • Strong Linux and Networking knowledge. We walk the cloud native walk, but still need to be deeply familiar with how underlying systems work when things go awry.
  • Knowledge of web design architecture and scalability. With Petal’s current rapid expansion we need candidates who are experienced at designing, building, and maintaining the web architectures of the future.
  • Sharp and critical eye for details. The ability to think holistically and also maintain focus on small intricate details is essential for the high-impact, production work infrastructure does.
  • Problem-solving versatility and resourcefulness.  There will be many new and unexpected problems, and we need someone who can do the required research/networking to propose well thought-out solutions.
  • Outstanding communication skills, verbal, written, and visual. We believe in excellent documentation, give frequent internal presentations, and help guide the organization on DevOps/SRE best practices.