Data Engineer Profiles: (Gen AI skilled )

Artech LLC

Whippany, NJ

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Apache Kafka, Apache Spark, Application Programming Interface (API), Artificial Intelligence (AI), Cloud Computing, Continuous Deployment/Delivery, Continuous Integration, Data Cleaning, Data Management, Data Modeling, Data Quality, Data Sets, Database Extract Transform and Load (ETL), Docker, GCP (Good Clinical Practices), High Throughput, Mathematics, Microsoft Windows Azure, Modeling Languages, Natural Language Processing (NLP), NoSQL, Performance Analysis, Performance Modeling, Performance Testing, Precision Testing, Python Programming/Scripting Language, Quality Management, SQL (Structured Query Language), Training Data Sets, Unstructured Data
LOCATION
Whippany, NJ
POSTED
13 days ago
DESCRIPTION:

Artech is currently seeking to add to the below position
Job Title: Data Engineer Profiles: (Gen AI skilled )
Job ID:  26-09483
Location: Whippany, NJ
Duration: 6-12 Months
 


Job Description:
Experienced and skilled in designing, building, and maintaining high-quality data pipelines, preprocessing workflows, and vector databases required for training, fine-tuning, and deploying Large Language Models (LLMs). Build and maintain high-throughput data pipelines, infrastructure, and storage solutions specifically to feed, train, and deploy AI/ML models, implementing RAG (Retrieval-Augmented Generation) systems, data cleaning, and model evaluation to ensure efficient, scalable, and reliable LLM applications.

Required Skills & Qualifications

Strong proficiency in Python is essential, along with SQL and NoSQL for data management.
Experience with LangChain, LlamaIndex, Hugging Face Transformers, and OpenAI API
Experience with Apache Spark, Kafka, or modern data stack tools.
Knowledge of NLP techniques, word embeddings, tokenization, and vector mathematics.
Familiarity with TensorFlow, PyTorch, or Hugging Face
Familiarity with cloud platforms (AWS, GCP, Azure), CI/CD, Docker, and Kubernetes.
Key Responsibilities

Design and build robust ETL/ELT pipelines for unstructured text data, including scraping, cleaning, deduplication, and transformation for LLM training.
Build and maintain vector search solutions (e.g., Pinecone, Milvus, Weaviate, Chroma) to store and retrieve embeddings for RAG systems.
Prepare high-quality datasets for fine-tuning adapters (e.g., LoRA) and train LLMs using frameworks like PyTorch or TensorFlow.
Implement Retrieval-Augmented Generation using frameworks like LangChain or LlamaIndex to connect LLMs to company data.
Develop evaluation frameworks for model performance, testing for accuracy, hallucination, and bias, and monitor deployed models.
Create APIs and internal web tools for data annotation, curation, and model interaction.

Please apply on our company website ( www.artechinfo.com ) with reference to job ID, or contact me at Sachin.Kumar@artech.com

About the Company

A

Artech LLC