Data Engineer

Waystation AI

Redwood, California

JOB DETAILS
SKILLS
Artificial Intelligence (AI), Automation, Chinese Language, Consumer Packaged Goods, Data Modeling, Data Quality, Email Technology, Grinding, Homebrew, Machine Tool, Natural Language Processing (NLP), Operating Systems, Precision Testing, Pricing, Production Systems, Purchasing/Procurement, Python Programming/Scripting Language, Quality Assurance, Regression Testing, Research & Development (R&D), Return on Investment (ROI), SLICE (Simulation Language with Integrated Circuit Emphasis), SQL (Structured Query Language), Software Patches, Spanish Language, Spreadsheets, Startup, Structured Data, Time Tracking
LOCATION
Redwood, California
POSTED
5 days ago

Data Engineer

The owner of the data layer the entire product is built on — from raw supplier email to structured system of record.

Location: Redwood City, CA (In-person, 5 days/week)

Experience: 8+ years building production data systems, including hands-on early-stage startup experience (required); document extraction / ML / NLP pipelines a strong plus

Company: Waystation AI

About Waystation AI

Waystation is building the operating system for procurement in consumer packaged goods (CPG).

Today, ingredient and packaging sourcing still runs through inboxes, PDFs, and spreadsheets. It's slow, opaque, and costly. Waystation replaces that chaos with an AI-powered procurement platform that creates structure, visibility, and leverage — without forcing suppliers into portals.

The result: real ROI. One customer saved over $200,000 in the first three months, paying for their annual contract in the first 30 days.

Waystation is led by repeat founder Ryan Caldbeck (previously founded CircleUp) and backed by Founder Collective, Homebrew, Slow Ventures, 87 Capital, Floodgate, and SuccessVP. We have paying customers, real usage, and a product that works.

The Role

Structured data isn't a feature of our product — it is the product. We take the messiest input imaginable (hundreds of thousands of disconnected supplier emails and PDFs — specs, COAs, pricing, certs) and turn it into a clean, queryable system of record shared across procurement, QA, and R&D.

You own that layer end to end. The extraction pipeline, the data model, the infrastructure the rest of engineering builds on — it's yours, not a slice of it. The quality of what every user sees, what every model trains on, and what every customer ROI claim rests on flows through what you build. No one will hold your hand. You'll have unusual access and unusual scope, and you'll be expected to use both. You'll move fast and ship scrappy — a rough system working today beats a perfect one next quarter. We don't have the resources to gold-plate, and neither do you.

What You'll Do

  • Own the extraction pipeline. Turn messy supplier emails and documents — specs, COAs, pricing, certs, multi-language, bad scans — into structured, validated data.

  • Push accuracy and prove it. Drive extraction past today's 85%+ and build the eval harness that measures it, per document type, so the number is real and not a vibe.

  • Own the data model. Unify suppliers, documents, RFPs, pricing, and certifications into one source of truth — and build for institutional memory, so every email compounds into leverage.

  • Build infrastructure others depend on. Ship reliable, observable pipelines and own data quality, lineage, and the monitoring that catches problems before customers do.

  • Treat extraction as an ML problem. Eval sets, regression testing, accuracy tracking over time — turn customer-reported errors into systematic improvements, not one-off patches.

  • Build leverage. Reach for models and agents first. Automate the long tail instead of grinding it.

What We're Looking For

We'll back the right engineer over the right résumé. We care about a defined edge, depth, and ownership — not polish.

You're a strong fit if you:

  • Have built in the chaos — required. You've done real work at an early-stage startup (seed or Series A), where there was no playbook, no infrastructure handed to you, and never enough hours. You know the difference between building from zero and maintaining someone else's system. A purely big-company background isn't a fit for this seat.

  • Move fast and stay scrappy. You ship, learn, and iterate in the open rather than polishing in private. Constraints — fewer people, less tooling, no time — energize you instead of stalling you. You find the version that works now and earn the polish later.

  • Have one superpower. There's a thing you're genuinely better at than almost anyone — data systems, extraction, ML pipelines — and you can name it and point to results that prove it. A sharp edge and the slope to outgrow the job, not evenly good at everything.

  • Have real depth. 8+ years building production data systems. Deep with Python, SQL, and modern data tooling. You can architect a system as easily as you can ship a fix — and you do both at startup speed.

  • Own whole problems. You take messy things start to finish and close them without being asked. When the data is wrong, you fix the system, not the symptom.

  • Build leverage. You reach for tools, automation, and agents to scale yourself instead of grinding manually. We live in Claude Code — you should want to, too.

  • Are all in. This is a rocket ship you want to plant a flag on and ride through the messy middle — not a stepping stone. We're betting on you; we need you betting on us.

  • Have grit. You've ground at something hard for a long time, through the part where it stopped being fun and the feedback loop ran far longer than your next review. You don't flinch when the work gets ugly.

Bonus: document extraction, NLP, or ML pipelines; regulated document-heavy domains; CPG, supply chain, or procurement; multi-language data (Chinese, Spanish).

What Success Looks Like

You'll ramp fast and gear toward a scorecard built on four measures:

  • Extraction accuracy. A measurable climb past existing accuracy (precision & recall) across document types — proven by the evals you built, not asserted.

  • Pipeline reliability. Data-quality and uptime the product can depend on. Bad or missing data gets flagged automatically, before a customer ever sees it.

  • Coverage of the long tail. More supplier formats and document types handled cleanly. The set of things that break the pipeline keeps shrinking.

  • Leverage for the team. The data layer becomes something the rest of engineering builds on without thinking about it.

Values

  • We are reliable, credible, and authentic

  • We are solution-oriented

  • We are proud of our work, our customers, and ourselves

What We Offer

  • Competitive base salary + meaningful equity — real ownership, with upside tied to the outcomes you drive

  • Ownership of the data layer the entire product is built on, working directly with a repeat founder & CEO — a front-row seat to how an AI-native company gets built

  • A real product with real ROI — value you can measure

  • Full health, dental, and vision coverage

  • Unlimited vacation — we care about outcomes, not hours

  • An in-person team that values craft and ambition

How to Apply

Don't send a cover letter. Send two things:

  • A hard system you owned. One pipeline or data problem, taken start to finish — what was true before, what you built, what was true after.

  • Something you automated or built with AI. An eval harness, an agent, a workflow that scaled you — anywhere you replaced manual work with a system.

Short is fine. We're reading for ownership and judgment, not polish.

About the Company

W

Waystation AI