Design and implement scalable batch and streaming data pipelines using PySpark within the Medallion Lakehouse architecture on Databricks, supporting credit risk analytics and reporting workflows
Build and maintain ingestion flows from upstream risk systems (loan origination, trading systems, market data feeds) into object storage (e.g., S3/ADLS) using Parquet and Delta formats, including partitioning, z-ordering, and schema evolution
Develop and optimize PySpark transformation logic for large-scale credit and counterparty risk data processing, ensuring accuracy, auditability, and regulatory compliance across Bronze, Silver, and Gold layers
Model and optimize risk measures (e.g., PD, LGD, EAD, EPE, PFE, CVA) for efficient query and consumption by BI tools, risk dashboards, notebooks, and downstream regulatory reporting applications
Integrate with external XVA/risk engines and implement orchestration logic to manage long-running risk computations and batch processing jobs
Ensure platform reliability, observability, data lineage, security (IAM roles, OIDC/Bearer-token auth, encryption at rest and in transit), and auditability to meet financial regulatory standards (Basel III/IV, FRTB, CECL)
Contribute to API and data contract design for internal risk consumers and external services; maintain thorough technical documentation aligned with audit and compliance requirements
Required Qualifications, Skills, and Capabilities
12 years of experience as a data developer, with significant exposure to financial services or banking environments
Strong domain expertise in Credit Risk and Counterparty Risk, including familiarity with regulatory frameworks such as Basel III/IV, IFRS 9, CECL, or FRTB
Expert-level proficiency in Python and PySpark/Apache Spark for large-scale financial data engineering, transformation, and analytics
Hands-on experience with Azure Databricks, including Medallion Architecture and Delta Lake for financial data workloads
Solid understanding of SQL including joins, window functions, stored procedures, and query optimization on large risk datasets
Experience building and managing data ingestion pipelines from diverse financial source systems (core banking, trade repositories, market data vendors, risk engines)
Familiarity with workflow orchestration tools such as Airflow or Databricks Workflows for managing complex risk batch processes
In-depth knowledge of CI/CD pipelines using Git, Jenkins, and Azure DevOps in a regulated financial environment
AWS Certified Cloud Practitioner or equivalent cloud certification
Experience creating data architecture diagrams, pipeline design documents, and data dictionaries aligned with risk and compliance requirements
Proven experience in Agile software development using JIRA, Confluence, and Zephyr
Strong communication skills — able to translate complex risk data concepts clearly for both technology and front-office/risk stakeholders
Highly collaborative with quant teams, risk managers, and compliance officers; proactive self-starter mindset
Demonstrated ability to learn and adopt new financial data technologies and regulatory requirements quickly
Passionate about data quality, financial data governance, and continuous improvement in a high-stakes regulated environment