NOC Engineer / NOC Analyst

HCL Global Systems Inc.

Redmond, WA

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Analysis Skills, Capacity and Performance Management, Communication Skills, DNS (Domain Name System), F5 Network Software, Failover, Firewalls, Hardware Virtualization, High Availability Software, Hybrid Cloud, IT Service Management (ITSM), ITIL (IT Infrastructure Library), Identify Issues, Incident Management, Internet Service Providers, Linux Administration, Linux Operating System, Load Balancing, Microsoft System Center Operations Manager (SCOM), Microsoft Windows Azure, Microsoft Windows Server, Nagios Monitoring Tool, Network Administration/Management, Network Attached Storage (NAS), Network Monitoring, Network Operations Center, Network Routers, Network Switching, Operational Support, Organizational Skills, Performance Management, Root Cause Analysis, Service Level Agreement (SLA), ServiceNow, Software Administration, Software Patches, Splunk, Storage Area Network (SAN), System Validation, Systems Administration/Management, TCP/IP (Transmission Control Protocol/Internet Protocol), Time Management, VMWare, VPN (Virtual Private Network), Virtualization, Web Server
LOCATION
Redmond, WA
POSTED
30+ days ago
NOC Engineer / NOC Analyst
Location: Redmond, WA
Local onsite; 24x7 rotational shifts (including weekends and on-call support) M-Sun 5a-5p PT

Strong knowledge of Windows & Linux server administration (basic troubleshooting L1 and L 1.5)
Virtualization: VMware & Nutanix ( L1 & L 1.5)
Storage systems: SAN/NAS, Isilon, Quantum or similar PB-scale storage
Networking fundamentals: TCP/IP, DNS, VPN, Firewalls, Load Balancers (F5) (L1 an L1.5)
Experience with monitoring tools (New Relic, Splunk Nagios, Zabbix, Dynatrace, SCOM, etc.)
Strong troubleshooting and analytical skills
Ability to work under pressure in critical outage scenarios

NOC Engineer / NOC Analyst – Job Description
Role Summary
Responsible for 24x7 monitoring, incident management, and operational support of a large-scale hybrid infrastructure including servers, virtualization platforms, storage systems, network devices, and applications. Ensure high availability, performance, and reliability across all environments (Prod, DR, Non-Prod).

THESE ARE THE MUST HAVE SKILLS. DO NOT SUBMIT IF THEY DO NOT HAVE THESE SKILLS
_______________________________________
Required Skills
Technical Skills
• Strong knowledge of:
o Windows & Linux server administration (basic troubleshooting L1 and L 1.5)
o Virtualization: VMware & Nutanix ( L1 & L 1.5)
o Storage systems: SAN/NAS, Isilon, Quantum or similar PB-scale storage
o Networking fundamentals: TCP/IP, DNS, VPN, Firewalls, Load Balancers (F5) (L1 an L1.5)
• Experience with monitoring tools (New Relic, Splunk Nagios, Zabbix, Dynatrace, SCOM, etc.)
• Understanding of ITSM tools (ServiceNow preferred) for incident, change, and problem management. Rubrik backup management tool.
Operational Skills
• Incident management and escalation handling in 24x7 environments
• Strong troubleshooting and analytical skills
• Ability to correlate infrastructure, network, and application issues
• Strong communication and coordination skills
• Ability to work under pressure in critical outage scenarios
• Good documentation and reporting skills
________________________________________
Preferred Qualifications
• ITIL Foundation certification
• Experience in large-scale enterprise or MSP environments
• Exposure to cloud or hybrid environments (AWS/Azure) is a plus.
________________________________________
Shift Requirement
• 24x7 rotational shifts (including weekends and on-call support)
________________________________________
Key Responsibilities
Infrastructure Monitoring & Operations
• Monitor ~1200 + servers (Windows/Linux), virtualization platforms (VMware, Nutanix), and web servers for performance and availability.
• Oversee storage systems (PB-scale: Quantum, Isilon, NAS, SAN) ensuring uptime and capacity health
• Monitor network infrastructure (1200+ devices) includes switches, routers, firewalls, VPN tunnels, WAPs, and ISP circuits.
• Monitor and action on the incidents, requests related to the Infra and tools hosted in the environment.
Incident & Event Management
• Perform L1/L2 triage for alerts, incidents, and outages across infrastructure and applications
• Ensure timely incident resolution, escalation, and communication as per SLAs
• Correlate alerts across tools to identify root causes and reduce noise
Application & Service Monitoring
• Monitor 50+ applications across multiple environments (Prod, DR, UAT, Dev)
• Track service health, availability, and dependencies (web, middleware, backend systems)
Capacity & Performance Management
• Track utilization trends across computing, storage (multi-PB), and network
• Proactively identify bottlenecks and recommend optimization
Change & Release Support
• Support infrastructure and application deployments, patches, and maintenance activities
• Validate system health pre/post changes
Disaster Recovery & Resilience
• Support DR readiness for large-scale storage and application environments
• Participate in DR drills and failover validation
Reporting & Documentation
• Maintain operational dashboards, runbooks, and incident reports
• Provide daily/weekly health and SLA reports

About the Company

H

HCL Global Systems Inc.