Must have: Splunk, Python
Good to Have: ELK, Java, NodeJS, ReactJS
The Sr. Engineer 2 (Site Reliability Engineer) role is a hands-on Senior Architect Level position supporting Client' CTO Service Reliability Engineering team. The ideal candidate must have experience in full stack engineering.
CTO Site Reliability engineering portfolio consists of several mission critical applications for URL blocked - click to apply
such as Manage your card account, Digital Acquisition, Membership Rewards, Client.com Mobile applications etc.. Mobile and Web engineering enterprise applications are highly available applications in an extremely high throughput transactional system with strict performance requirements. The Site Reliability Engineering team works with various Product teams, Staff Architects, Engineering Leaders and Engineering Teams across Mobile and Web engineering platforms. Primary focus of the Site Reliability Engineering team is to conceptualize, design, develop and implement frameworks/common components for enterprise that will ensure high application reliability, scalability, availability and performance engineering of the Mobile and Web application. Site reliability team is embarking on a transformation journey to implement "Robotics first approach in Service Delivery and Site Reliability Engineering.
Conceptualize and implement Site Reliability Engineering Framework/Components to improve predictive monitoring and driving SRE team's journey towards "Robotics First approach.
Research latest technology, concepts, conceptualize solution and develop proof of concept that will improve resiliency and performance of the production infrastructure. Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability.
Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability and performance engineering.
Work with operations team to resolve major incidents.
Continuously improve automated remediation tasks to ensure the highest
A BS degree in Computer Science, Computer Engineering, other Technical discipline, or equivalent work experience.
8+ years of Technical hands-on experience with systems analysis, incorporating: Design Methodology, Production Support and Engineering, Enterprise level technologies including, but not limited to OpenShift, WebSphere Administration, JEE (JSP, Servlets, XML, Java), and internet-related technologies to deliver complex Internet facing solutions.
Broad Technical field exposure, with preference to following skills: Cloud Infrastructure, VM, load balancing, containers, Kubernetes, JVM's, web servers, application debugging, queing technologies, Caching technologies, databases, routing and switching, etc.
Experience working relational and nosql databases such as DB2, Oracle, Cassandra & Redis.
Strong knowledge of Linux internals and experience managing Linux systems in high traffic environments.
Fluent in programing languages - Python
Strong interpersonal communication skills and the ability to work well in a diverse team-focused environment.
Experience with Splunk (Experience with ELK is a plus).
Familiarity with financial services and authorizations systems.
Understanding of using Agile Practices in Operations teams