Site Reliability Engineer

NOV Inc

Houston, TX

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Analysis Skills, Automation, Bash Scripting, Budget Management, Cloud Computing, Continuous Deployment/Delivery, Continuous Integration, DevOps, Distributed Computing, Docker, Documentation, Drilling, Ecosystems, Energy Efficiency, GCP (Good Clinical Practices), GitHub, Incident Management, Incident Response, Leadership, Metrics, Microsoft .NET, Microsoft C# (C Sharp), Microsoft Windows Azure, Oil and Gas, Operational Support, Performance Tuning/Optimization, PostgreSQL, Problem Solving Skills, Production Control, Production Systems, Python Programming/Scripting Language, Quality Assurance Methodology, Query Optimization, Reliability Engineering, Root Cause Analysis, Scripting (Scripting Languages), Service Delivery, Source Code/Configuration Management (SCM), Telemetry, Test Automation, Time Management, Trend Analysis, Windows PowerShell
LOCATION
Houston, TX
POSTED
11 days ago

As a Site Reliability Engineer, you will be responsible for: Operational Excellence & Incident Management

  • Maintain and monitor production systems for availability, latency, and performance.
  • Lead incident response efforts, including communication, resolution, and postmortem documentation.
  • Design and implement health checks, alerting systems, and automated remediation workflows.
  • Drive root cause analysis and implement permanent resolutions for recurring issues.

Observability & Insights

  • Set up and maintain full observability stacks (logging, metrics, tracing) using tools like Prometheus, Grafana, Datadog, OpenTelemetry, or ELK.
  • Analyze telemetry and logs to identify trends, anomalies, and opportunities for improvement.
  • Conduct post-incident reviews and use insights to inform future engineering investments.

Performance & Systems Optimization

  • Tune and optimize distributed systems, including AKKA.NET actors, for performance and resource efficiency.
  • Work with developers to evolve architecture and improve system throughput, latency, and stability.
  • Optimize PostgreSQL performance, queries, and maintenance strategies.

CI/CD & Automation

  • Design and maintain modern CI/CD pipelines using GitHub Actions, Azure Pipelines, or GitLab CI.
  • Automate deployment, testing, and rollback processes to reduce friction and increase deployment frequency.
  • Standardize infrastructure as code practices across environments.

We'd love to talk to you if you have:

  • 5+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
  • Expertise in Kubernetes and container orchestration at scale.
  • Strong experience with AKKA.NET or similar actor-based frameworks.
  • Proficiency with scripting and automation (Bash, PowerShell, Python).
  • Experience with observability tools (Phobos,Datadog, Prometheus, Grafana, OpenTelemetry, ELK).
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP).
  • Strong PostgreSQL knowledge-performance tuning, query optimization, maintenance.
  • Proven ability to lead incident management and drive postmortem processes.
  • A builder's mindset with high standards for operational excellence and technical ownership.

Preferred Tools & Ecosystem Experience

  • CI/CD: GitHub Actions, Azure Pipelines, GitLab CI
  • Infrastructure: Kubernetes, Docker, Terraform
  • Monitoring: Phobos (AKKA.NET), Datadog, Prometheus
  • Source Control: GitHub, GitLab, Azure DevOps
  • Programming: C#, Python, Bash, PowerShell

Every day, the oil and gas industry's best minds put more than 150 years of experience to work to help our customers achieve lasting success.

We Power the Industry that Powers the World

Throughout every region in the world and across every area of drilling and production, our family of companies has provided the technical expertise, advanced equipment, and operational support necessary for success-now and in the future.

Global Family

We are a global family of thousands of individuals, working as one team to create a lasting impact for ourselves, our customers, and the communities where we live and work.

Purposeful Innovation

Through purposeful business innovation, product creation, and service delivery, we are driven to power the industry that powers the world better.

Service Above All

This drives us to anticipate our customers' needs and work with them to deliver the finest products and services on time and on budget.

About the Company

N

NOV Inc