Observability & Reliability Practices: - Experience with reliability practices, including Metrics, logs, traces, alerting, dashboards, and service health monitoring; Incident response, root cause analysis, corrective action tracking, and runbooks; SLOs, SLIs, production readiness, resilience, and reliability improvement.
- Use approved AI-assisted engineering tools such as GitHub Copilot and OpenAI Codex responsibly in day-to-day delivery, and share practical patterns that accelerate reliability improvements, troubleshooting, documentation, and automation.