Pyspark data engineer

Axelon

Rutherford, NJ

JOB DETAILS
SALARY
$74–$78 Per Hour
SKILLS
Amazon Simple Storage Service (S3), Amazon Web Services (AWS), Analysis Skills, Apache, Apache Cassandra, Apache Hadoop, Apache Hive, Apache Spark, Application Programming Interface (API), Architectural Services, Best Practices, Big Data, Cloud Computing, Cloud Storage, Code Reviews, Communication Skills, Computer Science, Continuous Deployment/Delivery, Continuous Integration, Cost Effectiveness Analysis, Data Analysis, Data Management, Data Modeling, Data Processing, Data Quality, Data Science, Data Sets, Data Warehousing, Database Extract Transform and Load (ETL), Dimensional Modeling, Electronic Medical Records, GCP (Good Clinical Practices), Git, HDFS (Hadoop Distributed File System), Identify Issues, Information Technology & Information Systems, Information/Data Security (InfoSec), Microsoft Windows Azure, MongoDB, NoSQL, Problem Solving Skills, Process Management, Process Modeling, Python Programming/Scripting Language, Quality Monitoring, SQL (Structured Query Language), Scalable System Development, Source Code/Configuration Management (SCM)
LOCATION
Rutherford, NJ
POSTED
Today

Summary:

  • Work Mode: Not specified

Responsibilities:

  • Design, build, and optimize data pipelines using PySpark to extract, transform, and load (ETL) data from various sources into data lakes and data warehouses.
  • Develop and maintain scalable data processing jobs and frameworks using Apache Spark with Python (PySpark).
  • Work closely with data scientists, analysts, and business stakeholders to understand data requirements and deliver high-quality data solutions.
  • Implement data quality checks, monitoring, and alerting for data pipelines to ensure data accuracy and reliability.
  • Optimize existing PySpark jobs for performance, efficiency, and cost-effectiveness.
  • Manage and process large datasets, ensuring data governance, security, and compliance.
  • Troubleshoot and resolve issues in data pipelines and data processing jobs.
  • Participate in code reviews, contribute to architectural discussions, and promote best practices in data engineering.
  • Stay informed about new PySpark features, big data technologies, and industry best practices.
  • Document data pipelines, data models, and processes.

Requirements:

  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related quantitative field.
  • Minimum 7 years of experience as a Data Engineer, with significant experience specifically in PySpark.
  • Strong proficiency in Python programming.
  • Extensive experience with Apache Spark, including Spark SQL, Spark Streaming, and DataFrame API.
  • Solid understanding of data warehousing concepts, dimensional modeling, and ETL principles.
  • Proficiency in SQL for data querying and manipulation.
  • Experience with big data technologies such as Hadoop, HDFS, Hive, or similar.
  • Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and their data services (e.g., S3, ADLS, Google Cloud Storage, EMR, Databricks, Glue).
  • Experience with version control systems (e.g., Git).
  • Excellent problem-solving, analytical, and communication skills.

Preferred Skills:

  • Master's degree in a related field.
  • Experience with workflow orchestration tools (e.g., Apache Airflow, Azure Data Factory, AWS Step Functions).
  • Knowledge of stream processing technologies (e.g., Kafka, Kinesis).
  • Experience with NoSQL databases (e.g., MongoDB, Cassandra, DynamoDB).
  • Familiarity with data governance tools and practices.
  • Experience in a CI/CD environment.

About the Company

A

Axelon