Kafka Operations Administrator

Q1 Technologies, Inc

SEATTLE, WA

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Analysis Skills, Ansible, Apache Avro, Apache Kafka, Automation, Benchmarking, Best Practices, Brokerage, Capacity and Performance Management, Cloud Computing, Continuous Deployment/Delivery, Continuous Integration, Cryptography, Data Formats, Database Backup, Disaster Recovery, Docker, Documentation, Ecosystems, Failover, Firewall Administration, Food and Beverage Industry, GCP (Good Clinical Practices), Git, High Availability, IP (Internet Protocol) Routing, ITIL (IT Infrastructure Library), Identify Issues, Incident Response, Input/Output, JMX (Java Management Extensions), JSON, Java, Linux Administration, Machine Tool, Metrics, Microsoft Windows Azure, Network Routing, Network Switching, On Call, Performance Tuning/Optimization, Production Support, Production Systems, Reporting Dashboards, Resource Utilization, SSL-TLS (Secure Socket Layer - Transport Layer Security), Scripting (Scripting Languages), Service Level Agreement (SLA), Splunk, Systems Administration/Management, TCP/IP (Transmission Control Protocol/Internet Protocol), Telephone Skills, Testing, VMS Operating System, Virtual Machine (VM)
LOCATION
SEATTLE, WA
POSTED
30+ days ago
Job Title: Kafka Operations Administrator
Duration- Fulltime Permanent
Location:Seattle, WA /St. Louis, MO /Plano TX Onsite
Job Description:
Must Have Technical/Functional Skills
• Production-grade Apache Kafka operations experience, managing, maintaining and upgrading Kafka clusters in production environments with a focus on high availability, disaster recovery, fail-over and overall reliability
• Proficiency in installing and configuring monitoring systems using Grafana (building dashboards), Prometheus, Splunk , JMX metrics.
• Automation and orchestration experience: Terraform , Ansible, Helm, Kubernetes (EKS/AKS/GKE).
• Strong Linux system administration experience, including troubleshooting, automation and scripting for efficient infrastructure management.
• Experience in Production Support (ITIL processes followed) and participating in 24x7 on-call rotations , documenting incidents/postmortems.
• Experience in supporting JVM tuning, GC Analysis, network and disk I/O diagnostics
• Experience in TCP/IP, routing, switching and firewall configurations relevant to Kafka operations

Good to Have:
• Deep Kafka performance tuning and capacity planning experience
• Knowledge of message delivery semantics and guarantees (at-least-once, exactly-once)
• Cloud-native security/compliance experience (IAM, VPC, KMS, Security Groups)
• Certifications: Confluent Certified Administrator, AWS/Azure/GCP certifications
• Experience with Apache Kafka in Food and Beverage mode, including set up, configuration, troubleshooting and cluster management
• Containerization and Container Orchestration Tools experience: Docker, Kubernetes
• Experience with CI/CD pipelines and Git-based workflows
• Experience building custom Kafka connect libraries and understanding of data serialization formats (eg: Avro, JSON)
• Knowledge of networking concepts across on-prem VMs and cloud environments, ensuring seamless integration and communication between services.
• Strong understanding of topic management and security best practices for streaming platforms: TLS, ACLs, RBAC, encryption at rest/in transit
• Kafka ecosystem tooling experience: Kafka Connect, Schema Registry

Role and Responsibilities
• Deploy, configure and manage Kafka clusters and related services to meet SLA requirement
• Participate in 24x7 on-call rotation to respond to incidents, alerts, and escalations
• Triage, diagnose, and remediate production incidents; coordinate with stakeholders, developers and infrastructure teams
• Implement automation for provisioning, scaling, server/data backups, and disaster recovery
• Maintain monitoring, alerting thresholds, dashboards, and Kafka ecosystem health
• Harden Kafka deployments: configure TLS, ACLs, RBAC, encryption, and vulnerability remediation
• Perform routine maintenance: Kafka ecosystem upgrades (controllers, brokers, connect, and schema registry), rolling restarts, etc.
• Create and maintain runbooks, runbook automation, and post-incident reports
• Optimize performance and resource utilization; benchmark and tune clusters
• Support Kafka Connect/Schema Registry service and troubleshoot connector issues
• Contribute to CI/CD pipeline improvements for infrastructure and deployment automation

About the Company

Q

Q1 Technologies, Inc

Q1 consists of experienced and recognized experts providing the capability to respond to market demand in order to provide professional services for our clients including Enterprise software implementations, application integration and technical / functional support.

Q1 has steadily grown into a Quality IT services and solutions organization with the average experience of our team being over 10 years. We have continuously met or exceeded client expectations by delivering professional services and project implementations on time and under budget to help clients truly recognize return on investment.

COMPANY SIZE
500 to 999 employees
INDUSTRY
Computer/IT Services
FOUNDED
1990
WEBSITE
http://q1tech.com/