Provide hands-on resolution of technical and functional incidents, including user requests and troubleshooting across applications, infrastructure, and production environments Respond rapidly to production issues using data-driven decision-making, leading root-cause analysis, and conducting blameless post-mortems to minimize downtime and financial impact Enhance application health through improved monitoring and automation to reduce manual tasks and strengthen system resilience Deliver clear, timely communication to business stakeholders and senior technologists throughout incident management and follow-up activities Partner with development teams to design and deploy scalable, fault-tolerant solutions that align with business objectives, while strictly adhering to change, incident, and problem-management frameworks Champion engineering excellence by promoting continuous improvement, strengthening SRE capabilities within Production Support teams, and actively participating in resiliency BCP and component-failure testing. Bachelors Degree in Computer Science or equivalent experience with strong troubleshooting and collaboration in a mixed team Experience in controlled production environments with strong knowledge of ITIL practices, global service operations, and financial-technology or client-facing support functions Technical proficiency in UNIX, Linux, Solaris, Java, J2EE, Python, PowerShell, Structured Query Language, SQL scripting, databases, Oracle, MSSQL, debugging across infrastructure components and cloud technologies, Global Control Programme (GCP) preferred Hands-on background in Site Reliability Engineering, including building SLIs, working with monitoring and scheduling tools, such as Geneos, Splunk, Grafana, New Relic, and Control-M, as well as microservices APIs, CICD pipelines, and container platforms, OpenShift, and Kubernetes.