Data Engineer – Databricks

APN consulting Group

Tewksbury, NJ

JOB DETAILS
JOB TYPE
Full-time
SKILLS
Apache Avro, Artificial Intelligence (AI), Best Practices, Business Solutions, Cloud Computing, Computer Science, Consulting, Continuous Deployment/Delivery, Continuous Integration, Cross-Functional, Data Analysis, Data Collection, Data Lake, Data Management, Data Processing, Data Quality, Data Science, Data Storage, Data Structures, Database Extract Transform and Load (ETL), Debugging Skills, DevOps, Dimensional Modeling, Documentation, Elasticsearch, Git, Identify Issues, Information Technology & Information Systems, Java, Microsoft Windows Azure, MongoDB, NoSQL, Oracle, Oracle PL-SQL, Performance Tuning/Optimization, Problem Solving Skills, Process Improvement, Production Systems, Proof of Concept, Python Programming/Scripting Language, R Programming Language, Redis, Relational Databases (RDBMS), Requirements Management, SQL (Structured Query Language), Scala Programming Language, Scalable System Development, ServiceNow, Software Engineering, Star Schema, Team Lead/Manager, Team Player, Technical Recruiting, Technical Support, Writing Skills
LOCATION
Tewksbury, NJ
POSTED
30+ days ago
APN Consulting, Inc. is a progressive IT staffing and services company offering innovative business solutions to improve client business outcomes. We focus on high impact technology solutions in ServiceNow, Fullstack, Cloud & Data, and AI / ML. Due to our globally expanding service offerings we are seeking top-talent to join our teams and grow with us. Position: Data Engineer – Databricks Location: Oldwick, NJ (Hybrid) Duration: FTE Overview The Data Engineer is responsible for designing, building, and optimizing scalable data solutions to support a wide range of business needs. This role requires a strong ability to work both independently and collaboratively in a fast-paced, agile environment. The ideal candidate will engage with cross-functional teams to gather data requirements, propose enhancements to existing data pipelines and structures, and ensure the reliability and efficiency of data processes. Responsibilities Assist with leading the team's transition to the Databricks platform and utilize the newer features of Delta Live Tables, Workflows etc Design and develop data pipelines that extract data from Oracle, load it into the data lake, transform it into the desired format, and load it into Databricks data lakehouse Optimize data pipelines and data processing workflows for performance, scalability, and efficiency Implement data quality checks and validations within data pipelines to ensure the accuracy, consistency, and completeness of data Help create and maintain documentation for data mappings, data definitions, architecture and data flow diagrams Build proof-of-concepts to determine viability of possible new processes and technologies Deploy and manage code in non-prod and prod environments Investigate and troubleshoot data related issues and fix or provide solutions to fix defects Identify and resolve performance bottlenecks, which could include suggesting ways to optimize and performance tune databases and queries to enhance query performance Qualifications Bachelor's Degree in Computer Science, Data Science, Software Engineering, Information Systems, or related quantitative field 4 plus years of experience working as a Data Engineer, ETL Engineer, Data/ETL Architect or similar roles Must hold a current/active Databricks Data Engineer/Analyst certification Skills 4+ years of solid continuous experience in Python 3+ years working with Databricks with knowledge and expertise of data structures, data storage and change data capture gained from prior production implementations of data pipelines, optimizations, and best practices 3+ years of experience in Kimball dimensional modeling (star-schema comprising of facts, type1 and type2 dimensions, aggregates, etc.) with solid understanding of ELT/ETL 3+ years of solid experience writing SQL and PL/SQL code 2+ years of experience with Airflow 3+ years of experience working with relational databases (Oracle preferred) 2+ years of experience working with NoSQL databases: MongoDB, Cosmos DB, DocumentDB or similar 2+ years of cloud experience (Azure preferred) Experience with CI/CD utilizing git/Azure DevOps Experience with storage formats including Parquet/Arrow/Avro Effectively collaborate with team members while being able to work independently with minimal supervision Must have a creative mindset, knack to solve complex problems, passion to work with data, and a positive attitude Ability to collaborate within and across teams of different technical knowledge to support delivery and educate end users on data products Expert problem-solving skills, including debugging skills, allowing the determination of sources of issues in unfamiliar code or systems Pluses, but not required: Any work experience in the following: ETL / ELT tools: Spark, Kafka, Azure Data Factory (ADF) Languages: R, Java, Scala Databases: Redis, Elasticsearch We are committed to fostering a diverse, inclusive, and equitable workplace where individuals from all backgrounds feel valued and empowered to contribute their unique perspectives. We strongly encourage applications from candidates of all genders, races, ethnicities, abilities, and experiences to join our team and help us build a culture of belonging.

About the Company

A

APN consulting Group