Jr Data Engineer - Onsite - W2
Cliff Services Inc
Mc Lean, TX
Apply
JOB DETAILS
SALARY
$30–$30 Per Year
JOB TYPE
Full-time, Employee
SKILLS
AWS Lambda, Amazon Simple Storage Service (S3), Amazon Web Services (AWS), Apache Spark, Best Practices, Cloud Computing, Code Reviews, Computer Programming, Continuous Improvement, Data Analysis, Data Cleaning, Data Management, Data Modeling, Data Partitioning, Data Processing, Data Quality, Data Science, Data Warehousing, Database Design, Database Extract Transform and Load (ETL), Documentation, Electronic Medical Records, Machine Learning, Maintain Compliance, Performance Tuning/Optimization, Python Programming/Scripting Language, SQL (Structured Query Language), Software Engineering, Validation Testing
LOCATION
Mc Lean, TX
POSTED
21 days ago
Job Title: Data Engineer (Python / Spark / AWS) on W2
Location: Richmond, VA / McLean, VA / Dallas, TX Onsite (LOCALS only)
Duration: Long Term Contract
Interview Process:: Internal Screening Round followed by an In-Person (Face-to-Face) at VA or Dallas Tx
Job Summary
We are seeking talented and experienced Data Engineers with strong expertise in Python, Spark (PySpark), and AWS to contribute to large-scale data modernization and analytics initiatives. The selected candidates will design, develop, and optimize data pipelines and cloud-based data platforms that power enterprise reporting, analytics, and machine learning solutions. This role provides an excellent opportunity to work in a fast-paced, cloud-first environment leveraging modern AWS data technologies.
Key Responsibilities
Design, build, and maintain ETL / ELT pipelines and data ingestion workflows using Python and Spark (PySpark).
Develop and manage data solutions using AWS services such as S3, Glue, EMR, Redshift, Lambda, and Athena.
Implement efficient data modeling, schema design, and partitioning strategies for data lakes and warehouses.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Collaborate with data science, analytics, and application teams to deliver reliable and clean data.
Establish data quality checks, validation frameworks, and observability mechanisms.
Ensure adherence to data governance, lineage, and security standards.
Participate in code reviews, documentation, and continuous improvement initiatives.
Required Skills
Strong programming skills in Python (including Pandas and PySpark).
Hands-on experience with Apache Spark / PySpark for distributed data processing.
Proficiency with AWS data services S3, Glue, EMR, Lambda, Redshift, and Athena.
Strong SQL skills and understanding of data modeling and schema design.
Experience with workflow orchestration tools such as Airflow or AWS Step Functions.
Proven ability in ETL optimization, performance tuning, and pipeline monitoring.
Knowledge of data governance, lineage, and enterprise data management best practices.
Location: Richmond, VA / McLean, VA / Dallas, TX Onsite (LOCALS only)
Duration: Long Term Contract
Interview Process:: Internal Screening Round followed by an In-Person (Face-to-Face) at VA or Dallas Tx
Job Summary
We are seeking talented and experienced Data Engineers with strong expertise in Python, Spark (PySpark), and AWS to contribute to large-scale data modernization and analytics initiatives. The selected candidates will design, develop, and optimize data pipelines and cloud-based data platforms that power enterprise reporting, analytics, and machine learning solutions. This role provides an excellent opportunity to work in a fast-paced, cloud-first environment leveraging modern AWS data technologies.
Key Responsibilities
Design, build, and maintain ETL / ELT pipelines and data ingestion workflows using Python and Spark (PySpark).
Develop and manage data solutions using AWS services such as S3, Glue, EMR, Redshift, Lambda, and Athena.
Implement efficient data modeling, schema design, and partitioning strategies for data lakes and warehouses.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Collaborate with data science, analytics, and application teams to deliver reliable and clean data.
Establish data quality checks, validation frameworks, and observability mechanisms.
Ensure adherence to data governance, lineage, and security standards.
Participate in code reviews, documentation, and continuous improvement initiatives.
Required Skills
Strong programming skills in Python (including Pandas and PySpark).
Hands-on experience with Apache Spark / PySpark for distributed data processing.
Proficiency with AWS data services S3, Glue, EMR, Lambda, Redshift, and Athena.
Strong SQL skills and understanding of data modeling and schema design.
Experience with workflow orchestration tools such as Airflow or AWS Step Functions.
Proven ability in ETL optimization, performance tuning, and pipeline monitoring.
Knowledge of data governance, lineage, and enterprise data management best practices.
About the Company
C
Cliff Services Inc
INDUSTRY
Banking