Incident Manager - Fintech - remote

Paymentology
Posted 11 months ago
As an Incident Manager in EMEA you’ll play the key role in driving the right level of response to incidents, determining impact and coordinate and lead fellow Paymentologists to mitigate, communicating to users and ensuring appropriate remediations and orchestrate the Root Cause Analysis (RCA) process.


What you get to do:

You’ll work together with other Incident Managers and Engineers globally to ensure solid 24/7 coverage on how we monitor, detect, respond, communicate, and mitigate incidents. 

When not managing incidents, you'll help scale our ability to respond to incidents, improve our operations, analyse data to provide insights and deepen our technical expertise in products. As a result, you’ll be seen as the protector of our users - in minimizing the impact of incidents on their business and ensuring that Paymentology is always thinking of our customers. 

  • Act as an on-call Incident Commander, responsible for driving and managing incident resolution &communications with a high level of urgency, cross-functional collaboration, and accuracy, while partnering with a global and diverse set of teams, including Engineering, Product, Customer Support, Account teams, Risk &Fraud etc. 
  • Lead all user-facing incidents across domains at Paymentology. 
  • "User First" approach to determine impact, providing accurate situation reports, facilitating comms bridges, and ensuring useful and timely external communications to users. 
  • Proactively update internal stakeholders, customers &make decisions through data and influence by partnering with Engineering, Support, and other cross-functional teams. 
  • Own the root cause analysis process while conducting post-mortems, remediations identification, and ensure problem management tasks meet SLA and user expectations. 
  • Drive improvements in the incident handling process and incident management metrics and tooling based on trends and data of our incidents in collaboration with engineering, product, and other operations teams. 
  • Ensure the creation and progression of new problem tickets for recurrent service issues in a timely manner through to closure. 
  • Drive a culture that reduces repeat incidents, helping to join the dots up through shared learning. 
  • Support the review of all incidents across all priorities to identify the thematic root causes, impacts and actions detailing accurate and timely reports to key forums to drive improved decision making.  
  • Contribute ideas to evolve our processes, working practices and stakeholder relationships so that we continue to be recognised as a high performing, value adding team. 

What it takes to succeed:

We're looking for a customer obsessed, critical thinker who can join the dots up from multiple data points and someone who loves driving a timely solution to complex problems by facilitating, challenging, and getting the best out of the team you assemble during an incident to drive the right outcomes for our customers.  

  • 4+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on SaaS environments. 
  • Demonstrated ability to lead multiple incidents concurrently with authority and influence responders with agency and reasoning skills to resolve ambiguous problems and drive to root cause. 
  • Intermediate understanding of application development, application architectures, and applications deployed in cloud environments. 
  • Good understanding of infrastructure, including physical, virtual, and container-based platforms 
  • Demonstrated quantitative, and analytical skills in data manipulation using SQL, Splunk or other tools. 
  • Excellent task management skills &must be detail-oriented with the ability to remain composed, methodical, and think fast in a high-pressured environment. 
  • Exceptional written and verbal English communication skills, with the ability to translate complex technical issues for internal and external stakeholders. 
  • Strong awareness of their teams’ abilities and know that our people are our biggest asset. 
  • Proven ability to lead with influence, work methodically and calmly under pressure, facilitating and collaborating with colleagues to deliver the right outcomes for our business and customers. 
  • Ability to learn quickly – we provide a training programme that requires self-driven learning. This is a key component to help ramp-up in the job as well as progress your career quickly. 
  • A love of technology – an ideal candidate will have technology running through their veins and impart that passion to clients and the rest of the team. 
  • Self-motivated with the ability to work in a fast-moving environment. 
  • The role does require weekend support as part of a rotating shift-based coverage.  As we mature, we may consider moving this to an on-call arrangement.

Preferred Experience:  
  • Domain expertise in classes of incidents such as technical, privacy, security, or crisis with a strong desire to continuously learn about our products, technical issues, and systems. 
  • Ability to review complex technical details regarding ongoing issues/events and convey the key details to senior stakeholders to facilitate real-time decision making. 
  • Experience with broad user-facing communications (e.g., status pages) and/or targeted communications (e.g., direct emails, support ticket responses). 
  • Familiarity operating or managing distributed architectures with the ability to correlate system behaviours based on known inter-dependencies. 
  • Demonstrated understanding of full stack development and support. 
  • A solid &demonstratable understanding of Proven experience of working with ITIL disciplines, (Event, Incident, Problem, Change &CSI). 

This is a full-time, remote contractor position and we are looking for candidates in EMEA. Working flexible hours and shifts is essential for our remote team to function.