← Back to AI Engineer Jobs
Scale.jobs logo

LLM / GenAI Engineer

Scale.jobs

🇺🇸San Francisco Bay Area, USmid

  • arize phoenix
  • fastapi
  • hugging face transformers
  • langchain
  • langsmith
  • llama
  • llamaindex
  • lora
  • mistral
  • pgvector
  • pinecone
  • python
  • pytorch
  • qdrant
  • qlora
  • ragas
  • vllm

About The Role

The role drives the architecture, implementation, and scaling of generative AI systems, moving beyond simple API wrappers to build robust, production-grade LLM applications. The engineer will design complex agentic workflows, orchestrate multi-step reasoning chains, and implement advanced retrieval systems to deliver highly accurate, low-latency AI solutions.

Operating at the intersection of machine learning and modern software engineering, the position collaborates closely with product and data platform teams. The focus is on establishing rigorous evaluation frameworks, optimizing inference performance, and scaling vector search infrastructure to support enterprise-grade demand.

Key Responsibilities

  • Develop and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or custom orchestration frameworks
  • Design and scale vector database architectures (Pinecone, Qdrant, or pgvector) for efficient semantic retrieval, indexing, and metadata filtering
  • Implement automated LLM evaluation pipelines utilizing framework tools like Ragas or custom LLM-as-a-judge patterns to establish quantitative quality benchmarks
  • Deploy and fine-tune open-source models such as LLaMA or Mistral using parameter-efficient methods like LoRA and QLoRA on domain-specific datasets
  • Build secure, resilient backend microservices in Python using FastAPI to expose LLM agents and orchestration layers to client-facing web applications
  • Monitor LLM consumption, cost optimization, and inference latency in production using observability platforms like LangSmith or Arize Phoenix

What We Are Looking For

  • 3+ years of professional software engineering experience, with at least 1.5 years dedicated to deploying LLMs and generative AI features in production
  • Expert-level Python skills, with deep knowledge of asynchronous programming, API design, and modern backend frameworks
  • Proven experience with vector search engines, chunking strategies, embedding models, and managing semantic indexing pipelines at scale
  • Solid understanding of deep learning concepts, transformer architectures, and hands-on experience with PyTorch or Hugging Face Transformers
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related quantitative field
  • Bonus: Experience with vLLM deployment, custom model fine-tuning, knowledge graph integrations, or building agentic frameworks from scratch
Apply on linkedinVisit company →

More ai engineer jobs roles

  • AI Engineer DeveloperChatGPT Jobs · New York, NY, US→
  • CTIO AI Engineering ManagerJobs via Dice · New York, NY→
  • Responsible AI EngineerAccenture in India · Bengaluru, IN→
  • Associate Full Stack AI EngineerAscot Group · Bermuda, BM→
  • Staff AI EngineerSpotOn · San Francisco, US→
  • Applied AI Engineer, Codex Core AgentOpenAI · San Francisco, US→
  • AI Engineer ($170k–$220k + Equity) at WithshepherdJack & Jill · San Francisco, CA→
  • Full-Stack AI Engineer at GreylockJack & Jill · San Francisco, CA→
View all ai engineer jobs roles →

Don't miss the next ai engineer jobs role

Set up an alert and we'll email you matching openings. No spam, unsubscribe anytime.

Double opt-in: we'll email you a link to confirm. No spam, unsubscribe anytime.