One of our start-up clients is hiring a full-time AI Engineer to own the prompts, agents, evals, and pipelines behind user-facing features that ship to users.
You'll take product requirements and turn them into working prompts, agents, and pipelines. You'll evaluate them rigorously, iterate until they're production-ready, and keep improving them once they ship. This role sits at the intersection of product and platform: you decide what the AI should do, prove it works, and get it in front of users.
Because we're an early-stage company moving fast, we're looking for someone who can work quickly through ambiguous AI problems, measure output quality, and ship only when the system is reliable enough for production. This is an in-person role, 5 days a week in our office. The ability to tell the difference between "looks good in the demo" and "works in production" is essential.
Key Responsibilities
- Build new AI features end to end, from prototype to production.
- Improve AI output quality through prompt engineering, model selection, retrieval, and evaluation.
- Design and run evals that measure real output quality, not just first impressions.
- Iterate fast on prompts, agent designs, and orchestration patterns.
- Partner with the Product Engineer to translate requirements into AI features that actually work.
- Partner with the AI Platform team to land features on solid infrastructure.
- Evaluate new models, tools, and techniques when they improve quality, latency, cost, or reliability.
What We Are Looking For
- Hands-on experience building LLM-powered features that shipped to real users
- Production engineering chops in TypeScript/Node (primary, especially in AWS Lambda) and/or Python
- Experience with multiple LLM providers such as Anthropic, OpenAI, Google Vertex, AWS Bedrock, or similar
- Practical judgment in prompt engineering, retrieval, and agent design, backed by evaluation results
- Track record of building evaluation systems that actually catch regressions
- Solid software engineering fundamentals: you can write production code, not just notebooks
Seniority
- 3 - 8 years of experience in hands-on software engineering, building LLM-powered features that shipped to real users
Work experience
- Has shipped LLM-powered features to real users in production at a reputable, high-growth startup with a high engineering bar and can speak to what broke
- Built agents and agentic systems - orchestrating LLMs, tool use over large data sets
- Has kept up with the frontier of agentic AI methods with a finger on the pulse; LangChain-only experience is a yellow flag
- Experience on a small team (<15 engineers) or as a founding engineer / former founder.
- Enterprise vertical SaaS experience
- FDE- or explicitly customer-facing type experience
Education
- Bachelor's degree in Computer Science.
Hard Skills
- Production engineering chops in TypeScript/Node (primary, especially in AWS Lambda) and/or Python
- Experience with eval systems, structured output, and function calling.
- Experience with Anyscale Ray or similar distributed compute frameworks for batch inference, eval pipelines, or scaling agent workloads
- Open source contributions in the LLM or agent tooling space
- Familiarity with pgvector or other vector retrieval systems
- Experience with post-training or fine-tuning
Soft skills
- Genuinely excited about early-stage work and the company/mission