About The Role

The role drives the architecture, implementation, and scaling of generative AI systems, moving beyond simple API wrappers to build robust, production-grade LLM applications. The engineer will design complex agentic workflows, orchestrate multi-step reasoning chains, and implement advanced retrieval systems to deliver highly accurate, low-latency AI solutions.

Operating at the intersection of machine learning and modern software engineering, the position collaborates closely with product and data platform teams. The focus is on establishing rigorous evaluation frameworks, optimizing inference performance, and scaling vector search infrastructure to support enterprise-grade demand.

Key Responsibilities

Develop and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or custom orchestration frameworks
Design and scale vector database architectures (Pinecone, Qdrant, or pgvector) for efficient semantic retrieval, indexing, and metadata filtering
Implement automated LLM evaluation pipelines utilizing framework tools like Ragas or custom LLM-as-a-judge patterns to establish quantitative quality benchmarks
Deploy and fine-tune open-source models such as LLaMA or Mistral using parameter-efficient methods like LoRA and QLoRA on domain-specific datasets
Build secure, resilient backend microservices in Python using FastAPI to expose LLM agents and orchestration layers to client-facing web applications
Monitor LLM consumption, cost optimization, and inference latency in production using observability platforms like LangSmith or Arize Phoenix

What We Are Looking For

3+ years of professional software engineering experience, with at least 1.5 years dedicated to deploying LLMs and generative AI features in production
Expert-level Python skills, with deep knowledge of asynchronous programming, API design, and modern backend frameworks
Proven experience with vector search engines, chunking strategies, embedding models, and managing semantic indexing pipelines at scale
Solid understanding of deep learning concepts, transformer architectures, and hands-on experience with PyTorch or Hugging Face Transformers
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related quantitative field
Bonus: Experience with vLLM deployment, custom model fine-tuning, knowledge graph integrations, or building agentic frameworks from scratch

About The Role

Key Responsibilities

Develop and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or custom orchestration frameworks
Design and scale vector database architectures (Pinecone, Qdrant, or pgvector) for efficient semantic retrieval, indexing, and metadata filtering
Implement automated LLM evaluation pipelines utilizing framework tools like Ragas or custom LLM-as-a-judge patterns to establish quantitative quality benchmarks
Deploy and fine-tune open-source models such as LLaMA or Mistral using parameter-efficient methods like LoRA and QLoRA on domain-specific datasets
Build secure, resilient backend microservices in Python using FastAPI to expose LLM agents and orchestration layers to client-facing web applications
Monitor LLM consumption, cost optimization, and inference latency in production using observability platforms like LangSmith or Arize Phoenix

What We Are Looking For

3+ years of professional software engineering experience, with at least 1.5 years dedicated to deploying LLMs and generative AI features in production
Expert-level Python skills, with deep knowledge of asynchronous programming, API design, and modern backend frameworks
Proven experience with vector search engines, chunking strategies, embedding models, and managing semantic indexing pipelines at scale
Solid understanding of deep learning concepts, transformer architectures, and hands-on experience with PyTorch or Hugging Face Transformers
Bachelor’s or Master’s degree in Computer Science, Data Science, or a related quantitative field
Bonus: Experience with vLLM deployment, custom model fine-tuning, knowledge graph integrations, or building agentic frameworks from scratch

LLM / GenAI Engineer

More ai engineer jobs roles

Don't miss the next ai engineer jobs role

LLM / GenAI Engineer

More ai engineer jobs roles

Don't miss the next ai engineer jobs role