Senior Software Engineer - Kafka Platform

Balyasny Asset Management LP

IL

JOB DETAILS
SKILLS
Adoption, Amazon Web Services (AWS), Apache Kafka, Application Programming Interface (API), Asset Management, Automation, Bash Scripting, Best Practices, Brokerage, Budgeting, Business Activity Monitoring (BAM), Capacity Management, Centers for Disease Control and Prevention (CDC), Cloud Computing, Communication Skills, Continuous Deployment/Delivery, Continuous Integration, CruiseControl, Data Processing, Disaster Recovery, Distributed Computing, Documentation, Ecosystems, GCP (Good Clinical Practices), GitHub, Go Programming Language (Golang), Incident Response, Information/Data Security (InfoSec), Java, Jenkins, Linux Operating System, Machine Tool, Mentoring, Metrics, Microsoft Windows Azure, NFS (Network File System), OAuth, On Call, Performance Modeling, Performance Tuning/Optimization, Production Systems, Productivity Model, Python Programming/Scripting Language, Reliability Engineering, Replication and Remote Mirroring, Reporting Dashboards, SASL (Simple Authentication and Security Layer), SSL-TLS (Secure Socket Layer - Transport Layer Security), Software Development, Software Engineering, Standards Strategy, Systems Maintenance
LOCATION
IL
POSTED
30+ days ago

Job Detail

Loading

×Sorry to interrupt

CSS Error

Refresh

Skip to Main Content

Balyasny Asset Management HomepageOpen Main Navigation

About UsAbout Us

How We WorkHow We Work

Our StrategiesNews & Insights

CareersCareers

Open Roles

Senior Software Engineer - Kafka Platform

ChicagoNew York

Posted Yesterday

We are seeking a software engineer with strong Kafka experience to help build and evolve BAM's event-streaming platform. This role is ideal for an engineer who combines hands-on software development skills with deep knowledge of Kafka and distributed systems.

You will design, build, and operate the core services, tooling, and automation that power our Kafka platform. In addition to maintaining a reliable, secure, and scalable streaming environment, you will write production-quality code to improve self-service adoption, developer experience, and platform resilience across the firm

What you'll do

  • Develop internal software, APIs, tooling, and automation to simplify Kafka provisioning, access management, topic lifecycle management, and operational workflows.
  • Architect, deploy, and operate production-grade Kafka clusters (self-managed and/or Confluent/MSK), including upgrades, capacity planning, multi-AZ/region DR, and performance tuning.
  • Operate Kafka on Kubernetes using Operators, Helm, and GitOps, and build IaC-driven automation with guardrails for repeatable, compliant, zero-downtime provisioning and deployments.
  • Implement and manage Kafka Connect, Schema Registry, and MirrorMaker 2/Cluster Linking; standardize connectors (e.g., Debezium) and build self-service patterns.
  • Drive reliability: define SLOs/error budgets, on-call rotations, incident response, postmortems, runbooks, and automated remediation.
  • Implement observability: metrics, logs, traces, lag monitoring, and capacity dashboards (e.g., Prometheus/Grafana, Burrow, Cruise Control, OpenTelemetry).
  • Secure the platform: TLS/mTLS, SASL (OAuth/SCRAM), RBAC/ACLs, secrets management, network policies, audit, and compliance automation.
  • Guide event-streaming best practices: topic design, partitioning, compaction/retention, idempotency, ordering, schema evolution/compatibility, DLQs, EOS semantics.
  • Partner with app, data, and SRE teams; provide enablement, documentation, and internal tooling for a great developer experience.
  • Lead/mentor engineers and contribute to roadmap, standards, and platform strategy.

Required qualifications

  • Excellent communication and partnership skills with platform and application teams.
  • Deep hands-on experience operating Kafka in production at scale (brokers, controllers, partitions, ISR, tiered storage/retention, rebalancing, replication, recovery).
  • Strong software engineering fundamentals, with experience building and maintaining production systems.
  • Strong Kubernetes expertise running stateful systems.
  • Automation first: Infrastructure as Code (Terraform), Helm, Operators, GitOps (Argo CD/Flux), and CI/CD (e.g., GitHub Actions/Jenkins) for platform lifecycle.
  • Proficiency with one or more languages for tooling/automation: Python, Go, or Java; plus Bash and solid Linux fundamentals (networking, filesystems, JVM tuning basics).
  • Observability and reliability engineering for Kafka: Prometheus/Grafana, logging, alerting, lag monitoring, capacity/throughput modeling, performance tuning.
  • Security for data in motion: TLS/mTLS, SASL/OAuth, ACL/RBAC, secrets management (e.g., Vault), and audit/compliance practices.
  • Experience with Kafka ecosystem components: Kafka Connect, Schema Registry, MirrorMaker 2/Cluster Linking; familiarity with Cruise Control.
  • Cloud experience (AWS/Azure/GCP) with networking, IAM, and one or more managed offerings (e.g., Confluent Cloud or AWS MSK).
  • Proven track record designing runbooks, leading incidents/postmortems, and driving platform roadmaps.

Nice to have

  • Experience building internal developer platforms or self-service infrastructure products.
  • Data processing frameworks (Kafka Streams, Flink, Spark Structured Streaming) and EOS semantics.
  • Experience with Strimzi or Confluent for Kubernetes in production.
  • Knowledge of CDC patterns and tools (e.g., Debezium) and database connectors at scale.
  • Multi-region architectures, cluster linking strategies, and disaster recovery drills.

With respect to NY, CA, and IL based applicants, the starting base pay range for this role is between USD 200000 and USD 250000 annually. The actual base pay is dependent upon several factors, including, but not limited to, relevant experience, business needs and market demands. This role may also be eligible for bonus compensation and employee benefits.

Apply now

Loading

Job Detail

About the Company

B

Balyasny Asset Management LP