GCP Data Engineer (Health Care Background Must)
SRI TECH SOLUTIONS
Atlanta, GA(remote)
Position:- GCP Data Engineer (Health Care Background Required)
Location: Across the USA, any Location (Remote Role)
Summary: Strong experience architecting enterprise data platforms on Google Cloud (GCP). The architect will work as a strategic technical partner to design and build a GCP BigQuery-based data lake & data warehouse ecosystem.
The role requires deep hands-on expertise in data ingestion, transformation, modeling, enrichment, and governance, combined with a strong understanding of clinical healthcare data standards, interoperability, and cloud architecture best practices.
Key Responsibilities:
1. Data Lake & Data Platform Architecture (GCP)
- Architect and design an enterprise-grade GCP-based data lakehouse leveraging BigQuery, GCS, Dataproc, Dataflow, Pub/Sub, Cloud Composer, and BigQuery Omni.
- Define data ingestion, hydration, curation, processing, and enrichment strategies for large-scale structured, semi-structured, and unstructured datasets.
- Create data domain models, canonical models, and consumption-ready datasets for analytics, AI/ML, and operational data products.
- Design federated data layers and self-service data products for downstream consumers.
2. Data Ingestion & Pipelines
- Architect batch, near-real-time, and streaming ingestion pipelines using GCP Cloud Dataflow, Pub/Sub, and Dataproc.
- Set up data ingestion for clinical (EHR/EMR, LIS, RIS/PACS) datasets including HL7, FHIR, CCD, DICOM formats.
- Build ingestion pipelines for non-clinical systems (ERP, HR, payroll, supply chain, finance).
- Architect ingestion from medical devices, IoT, remote patient monitoring, and wearables leveraging IoMT patterns.
- Manage on-prem → cloud migration pipelines, hybrid cloud data movement, VPN/Interconnect connectivity, and data transfer strategies.
3. Data Transformation, Hydration & Enrichment
- Build transformation frameworks using BigQuery SQL, Dataflow, Dataproc, or dbt.
- Define curation patterns including bronze/silver/gold layers, canonical healthcare entities, and data marts.
- Implement data enrichment using external social determinants, device signals, clinical event logs, or operational datasets.
- Enable metadata-driven pipelines for scalable transformations.
4. Data Governance & Quality
- Establish and operationalize a data governance framework encompassing data stewardship, ownership, classification, and lifecycle policies.
- Implement data lineage, data cataloging, and metadata management using tools such as Dataplex, Data Catalog, Collibra, or Informatica.
- Set up data quality frameworks for validation, profiling, anomaly detection, and SLA monitoring.
- Ensure HIPAA compliance, PHI protection, IAM/RBAC, VPC SC, DLP, encryption, retention, and auditing.
5. Cloud Infrastructure & Networking
- Work with cloud infrastructure teams to architect VPC networks, subnetting, ingress/egress, firewall policies, VPN/IPSec, Interconnect, and hybrid connectivity.
- Define storage layers, partitioning/clustering design, cost optimization, performance tuning, and capacity planning for BigQuery.
- Understand containerized processing (Cloud Run, GKE) for data services.
6. Stakeholder Collaboration
- Work closely with clinical, operational, research, and IT stakeholders to define data use cases, schema, and consumption models.
- Partner with enterprise architects, security teams, and platform engineering teams on cross-functional initiatives.
- Guide data engineers and provide architectural oversight on pipeline implementation.
7. Hands-on Leadership
- Be actively hands-on in building pipelines, writing transformations, building POCs, and validating architectural patterns.
- Mentor data engineers on best practices, coding standards, and cloud-native development.
Required Skills & Qualifications
Technical Skills (Must-Have)
- 10+ years in data architecture, engineering, or data platform roles.
- Strong expertise in GCP data stack (BigQuery, Dataflow, Composer, GCS, Pub/Sub, Dataproc, Dataplex).
- Hands-on experience with data ingestion, pipeline orchestration, and transformations.
- Deep understanding of clinical data standards:
- HL7 v2.x, FHIR, CCD/C-CDA
- DICOM (for scans and imaging)
- LIS/RIS/PACS data structures
- Experience with device and IoT data ingestion (wearables, remote patient monitoring, clinical devices).
- Experience with ERP datasets (Workday, Oracle, Lawson, PeopleSoft).
- Strong SQL and data modeling skills (3NF, star/snowflake, canonical and logical models).
- Experience with metadata management, lineage, and governance frameworks.
- Solid understanding of HIPAA, PHI/PII handling, DLP, IAM, VPC security.
Cloud & Infrastructure
- Solid understanding of cloud networking, hybrid connectivity, VPC design, firewalling, DNS, service accounts, IAM, and security models.
- Cloud Native Data movement services
- Experience with on-prem to cloud migrations.
- Integrate new data and big data management technologies and software engineering tools
- Utilize the new big data tools
- Ensure the data architecture can be extendable for big data solutions
- Implement data tools to support analytics and data scientist team
- Breathe data systems architecture engineering
- Design the data architecture and data integration layers
- Ensure that data center networks
- Develop business critical data solutions
- Manage the migration of data from legacy systems to new data solutions
- Troubleshoot data processing and regular data loads
- Integrate new data technologies/tools across the enterprise
- Evangelize data best practices and implement analytics solutions
- Discover data across many different systems, data sources, and data types
- Collect and store big data
- Executing data center infrastructure engineering projects About AECOM
- Executing data center infrastructure engineering projects
- Implement data extraction tools with integration of a variety of data sources and data formats
- Solve big data problems with smart algorithmic solutions
- Assist with data-related technical issues
- Build and integrate data from various resources and manage big data