Role in one line

Build the agent runtime — the brains of the platform: stateful LangGraph workflows, RAG, tool integrations, model routing, human-in-the-loop gates, and audit-grade logging — in Python.

Context

We are building a multi-agent AI platform for a regulated banking client. The agent runtime is a Python / FastAPIservice exposing OpenAI-compatible endpoints, with agents implemented as LangGraph graphs and tools exposed over MCP. Sensitive workloads (KYC, PII) run on self-hosted open-weight models on EU infrastructure (Hetzner); less sensitive workloads route to frontier models via AWS Bedrock EU. The AI engineer owns the agent logic, retrieval, and the model / tool layer.

What you will work on

Design and build stateful agent graphs in LangGraph — multi-stage workflows, tool calling, and interrupt-based human-in-the-loop gates (e.g. the six-stage KYC assistant).
Build RAG pipelines: ingestion, chunking, embeddings, retrieval, and grounded Q&A with source citations (e.g. meeting analysis, search lens).
Implement structured-output patterns where the LLM emits validated JSON only and deterministic engines do the rest — keeping the model out of the render path for document generation.
Integrate tools over MCP (streamable-http) and wire agents to bank data sources behind the tool layer.
Work across hosted and self-hosted models — a per-task model router selecting between Bedrock EU and on-prem open-weight models — with prompt design, evaluation, and cost / latency awareness.
Build audit-grade logging and provenance for every AI operation, with PII handled by hashing rather than plaintext.

Must-have

Strong Python: production-grade, typed, tested — not notebook-only.
LangChain and LangGraph: hands-on building stateful graphs, tool / function calling, and human-in-the-loop interrupts.
LLM application patterns: RAG, prompt design, structured output / JSON-schema validation, and evaluation of agent behaviour.
Retrieval stack: vector stores (pgvector and / or Qdrant) and embeddings.
Serving: FastAPI for agent endpoints; comfort with async Python.
Model access: working with hosted APIs (AWS Bedrock) and self-hosted open-weight models (Llama, Mistral, Qwen).

Nice-to-have

Banking / regulated experience; awareness of audit logging, PII handling, and EU data residency.
MCP (Model Context Protocol) tool integration.
ASR / speech-to-text pipelines (for meeting analysis).
MLOps for self-hosted inference (vLLM, Ollama, or similar) on EU infrastructure (Hetzner).
Workflow orchestration (e.g. Camunda) and deterministic document rendering (docxtpl, openpyxl).

Tech stack you will touch

Python, LangChain, LangGraph, FastAPI · pgvector / Qdrant, embeddings · AWS Bedrock EU and self-hosted open-weight LLMs on Hetzner · MCP servers (streamable-http) · Docker, Git, CI/CD.

Ways of working

Remote, distributed delivery team; English working language; scrum-light cadence.
Banking-grade rigor: every AI operation logged and auditable, human-in-the-loop by design, compliance built into the architecture — not bolted on.
Agents prepare, retrieve, draft, and propose; the trigger and the final action always stay with a human.

Important

As this is a Germany-based project, we are primarily seeking candidates based in Western Ukraine, with Vinnytsia and Lviv being our preferred locations.