Acute Care, Application Programming Interface (API), Autoscaling, Best Practices, Coaching, Continuous Deployment/Delivery, Continuous Integration, Data Analysis, Data Quality, Data Science, Design Patterns Programming Methodologies, GPU (Graphics Processing Unit), Healthcare, Home Care, Hospice Care, Infrastructure Software, Machine Tool, Nursing, Object Oriented Programming (OOP), Performance Management, Performance Modeling, Production Systems, Python Programming/Scripting Language, Refactoring, Regulatory Requirements, Reporting Dashboards, Semantic Search, Software Engineering, Standards Development, Testing, Unit Test
Strategic Healthcare Programs (SHP) is a leading provider of analytics and performance management solutions for the post-acute healthcare market. We are an industry leader in helping Home Health, Hospice, and Skilled Nursing providers improve their financial and quality performance while complying with many regulatory requirements. Additionally, we connect the post-acute world to the broader provider markets to allow for optimal management across the continuum of care.
Role Overview
Were hiring a strong Python engineer to build and operate our production ML platform end-to-end. Youll productionalize data science work by building robust on-premises infrastructure, establishing software engineering best practices, and creating the tooling that enables our data scientists to ship faster. All infrastructure is self-hosted.
This is a remote or hybrid position within the United States. Employees living within 75 miles of the Santa Barbara office are required to work in-person in the office every Wednesday.
ML experience is welcome but not required. We care most about your software engineering foundation: production Python, OOP, testing, and async/parallel performance. Our existing ML engineers will get you up to speed on the ML side - frameworks, LLMs, vector stores, vLLM, and the rest.
Team: Youll join a tight ML team where every engineer owns meaningful surface area. Were a small team where every engineer owns their code end-to-end. We value people who deeply understand the systems they build - not just that they run.
What Youll Do Day-to-Day
Production ML Systems (40%)
- Build automated ML pipelines: data ingestion training evaluation deployment retraining
- Deploy and serve models (batch + real-time) via FastAPI/Flask APIs with auto-scaling and rollback
- Implement CI/CD for ML: model packaging, versioning, automated deployments
- Optimize workflows using async, parallelism, Ray, and Dask
ML Platform & Tooling (35%)
- Design reusable internal Python packages for preprocessing, training, inference, and evaluation
- Refactor data science notebooks into maintainable OOP modules
- Build workflow orchestration for training and inference pipelines
- Create standardized templates for model development
Observability & Reliability (15%)
- Monitor latency, drift, data quality, and model performance
- Build alerting for degradation and anomalies (Prometheus, Grafana)
- Create dashboards for production model health
- Set up automated retraining triggers
Code Quality & Collaboration (10%)
- Coach data scientists on production-grade Python: testing, OOP, async/parallel patterns
- Establish and enforce software best practices across the ML codebase
- Partner with data scientists to translate pain points into engineering solutions
Required Skills
Must Have:
- 5+ years of production Python engineering
- Strong OOP fundamentals: classes, inheritance, composition, design patterns
- Testing discipline: unit, integration, fixtures, mocking
- Demonstrated async and parallel optimization (asyncio, multiprocessing, threading)
- Building and operating production Python services (APIs, workers, background jobs)
- Familiarity with FastAPI or Flask
- Experience deploying to self-hosted/on-prem environments
Soft Skills:
- Translate engineering needs into clean, maintainable code
- Comfortable coaching peers on production engineering practices
- Curious about ML and motivated to ramp into it
Nice-to-Have
- Prior MLOps or ML platform experience
- ML frameworks: scikit-learn, XGBoost, PyTorch
- Observability stack: Prometheus, Grafana, structured logging/tracing
- RAG pipelines: vector stores, semantic search
- LLM serving: vLLM, Text Generation Inference
- GenAI/agentic frameworks: LangChain, LlamaIndex, DSPy
- Orchestration: Prefect, Kubeflow, Airflow, or similar
- Kubernetes and containerization in on-prem environments
- Experiment tracking: MLflow
- LLM observability: Phoenix, Langfuse, OpenLIT
- On-prem GPU infrastructure management
Pay
$140,000. - $175,000. annual, depending upon experience.
Benefits
We value work/life balance. We offer comprehensive health benefits, a 401(k) plan with a company match, an employee stock purchase plan, vacation time, sick time, and paid holidays.
This position is not eligible for immigration sponsorship.