Localize is seeking a Platform Reliability Engineer to join our growing engineering team. As Localize expands, the scalability, reliability, and performance of our infrastructure and applications have become paramount. This role is dedicated to overseeing and managing all aspects of Localize’s technical infrastructure, databases, software tools, and to implementing systems for effective monitoring, alerting, and maintenance. You will be responsible for the scalability, stability, reliability, and performance of the Localize platform. This role will also support Devops and enhance systems used by the engineering team to improve productivity.
Key Responsibilities:
- Oversee and manage Localize's infrastructure across AWS and Cloudflare.
- Ensure the scalability, reliability, performance, and security of Localize's data stores, specifically Redis and MongoDB, through effective configuration, monitoring, query optimization, and backup management.
- Oversee and automate deployment process.
- Own and improve monitoring of uptime and performance using tools such as Bugsnag, Datadog, and New Relic.
- Identify, plan and implement improvements and optimizations for infrastructure regarding cost, reliability, scalability.
- Develop and maintain detailed documentation and maps of our infrastructure and dependencies.
- Actively learn new technologies and systems to improve the efficiency and capabilities of our infrastructure.
Must-Have Skills:
- 5+ years engineering experience. At least 2 years of experience in a SRE and/or Devops role.
- Expertise in managing and optimizing infrastructure in AWS.
- Redis and MongoDB, including configuration, monitoring, optimization, and management of backups.
- Experience with manual or automated deployment &release management.
- Knowledge of best practices for securing infrastructure, including managing access controls and understanding potential vulnerabilities.
- Skills in assessing and optimizing performance of infrastructure and applications for improved performance, reliability, scalability, and cost-efficiency.
- Proficiency in command line scripting
Nice-to-Have Skills or Experience:
- Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Understanding of continuous integration and continuous deployment (CI/CD) processes and tools.
- Elasticsearch configuration for advanced search capabilities.
- Disaster Recovery Planning including data integrity and availability in case of emergencies.
- Experience with APM and logging tools such as DataDog, New Relic, or Kibana.