Dallas, Texas30+ days ago
div> Data Ingestion & Orchestration
Ā· Experience building batch and streaming ingestion pipelines using GCP-native services
Ā· Knowledge of Pub/Sub-based streaming architectures, event schema design, and versioning
Ā· Strong understanding of incremental ingestion and CDC patterns, including idempotency and deduplication
Ā· Hands-on experience with workflow orchestration tools (Cloud Composer / Airflow)
Ā· Ability to design robust error handling, replay, and backfill mechanisms
Data Processing & Transformation
Ā· Experience developing scalable batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
Ā· Strong proficiency in BigQuery SQL, including query optimization, partitioning, clustering, and cost control.
Ā· About Position:
Identity & Access Management (IAM) Data Modernization ā migration of an onāpremises SQL data warehouse to a targetāstate Data Lake on Google Cloud (GCP), enabling metrics & reporting, advanced analytics, and GenAI use cases (natural language querying, accelerated summarization, crossādomain trend analysis) leveraging PySparkābased processing, cloudānative DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and highāperformance data solutions.
.