Tatango is looking for a passionate Site Reliability Engineer to join our growing engineering team. Ideally, the candidate should be able to build, maintain, and extend high-quality, innovative, and scalable infrastructures in compliance with industry best practices.
Responsibilities:
- Support, maintain, and extend the Tatango cloud infrastructure.
- Implement monitoring, logging, and tracing solutions to maximize uptime and performance of Tatango’s applications.
- Maintain and improve the CI/CD pipelines supporting Tatango applications.
- Leverage best practices from industry in consultation with engineering team personnel.
Requirements:
- 2+ years of Terraform Experience
- 3+ years of SRE &DevOps experience (preferably in a large-scale Cloud infrastructure)
- 2+ years of AWS Experience
- 2+ Years of Docker experience
- 2+ years of scripting skills and automation coding (including Bash, Ruby, or Python)
- Experience with Linux system administration
- Ability to work in a fast-paced environment and dealing with ambiguity
- Solid verbal and written English communication skills
- Good analytical and problem-solving skills
- Ability to support weekly late-night deploys (typically 9-10 PM Pacific time)
Extra Credit:
- AWS Certifications
- Experience with monitoring tools (e.g. Graphite, TICK, Datadog, Prometheus)
- Experience with Elasticsearch, DynamoDB, and Amazon Kinesis
- Experience in capacity planning
- Strong sense of system architecture and design patterns
- Software development experience in Node.js, Ruby, or Python
About Tatango:
- We’re industry leaders in the text message marketing space.
- We communicate primarily through Slack, with the occasional videoconference to mix things up.