About The Role
The role drives the architecture, implementation, and scaling of generative AI systems, moving beyond simple API wrappers to build robust, production-grade LLM applications. The engineer will design complex agentic workflows, orchestrate multi-step reasoning chains, and implement advanced retrieval systems to deliver highly accurate, low-latency AI solutions.
Operating at the intersection of machine learning and modern software engineering, the position collaborates closely with product and data platform teams. The focus is on establishing rigorous evaluation frameworks, optimizing inference performance, and scaling vector search infrastructure to support enterprise-grade demand.
Key Responsibilities
- Develop and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or custom orchestration frameworks
- Design and scale vector database architectures (Pinecone, Qdrant, or pgvector) for efficient semantic retrieval, indexing, and metadata filtering
- Implement automated LLM evaluation pipelines utilizing framework tools like Ragas or custom LLM-as-a-judge patterns to establish quantitative quality benchmarks
- Deploy and fine-tune open-source models such as LLaMA or Mistral using parameter-efficient methods like LoRA and QLoRA on domain-specific datasets
- Build secure, resilient backend microservices in Python using FastAPI to expose LLM agents and orchestration layers to client-facing web applications
- Monitor LLM consumption, cost optimization, and inference latency in production using observability platforms like LangSmith or Arize Phoenix
What We Are Looking For
- 3+ years of professional software engineering experience, with at least 1.5 years dedicated to deploying LLMs and generative AI features in production
- Expert-level Python skills, with deep knowledge of asynchronous programming, API design, and modern backend frameworks
- Proven experience with vector search engines, chunking strategies, embedding models, and managing semantic indexing pipelines at scale
- Solid understanding of deep learning concepts, transformer architectures, and hands-on experience with PyTorch or Hugging Face Transformers
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related quantitative field
- Bonus: Experience with vLLM deployment, custom model fine-tuning, knowledge graph integrations, or building agentic frameworks from scratch
