AI Engineer – Agentic AI & Model Infrastructure

Role Overview

We are looking for an AI Engineer who will own AI systems end-to-end — from speech intelligence and model evaluation to agentic workflows, self-hosted model serving, and production reliability.

This role requires strong ownership across the modern AI stack. You will be responsible for building, deploying, optimizing, and monitoring AI systems that operate reliably in real-world production environments.

This is not a model-tuning-only role. The expectation is to own the complete AI system — model selection, integration, infrastructure, evaluation, observability, and continuous improvement.

Key Responsibilities

AI Model Development & Optimization

Evaluate, integrate, and optimize AI models across different use cases.
Own model selection decisions based on accuracy, performance, cost, and scalability.
Build production-grade AI systems using modern LLM and ML technologies.

Speech AI Systems

Work on speech-to-text (STT) systems to convert audio into accurate transcripts.
Optimize STT models using real-world quality metrics.
Improve speaker diarization and speaker identification accuracy for complex multi-speaker environments.
Build systems capable of handling noisy, real-world audio scenarios.

Agentic AI & LLM Systems

Build agentic AI workflows that transform conversations into memory, insights, and actions.
Develop memory and retrieval pipelines.
Design LLM orchestration workflows and agent architectures.
Work with LLM gateways and agent orchestration frameworks.
Build scalable AI applications using retrieval-augmented generation (RAG) patterns.

Model Serving & Infrastructure

Deploy and optimize self-hosted AI models.
Work with model serving frameworks such as:
vLLM
Triton
Similar production serving platforms
Own model performance including:
Latency
Throughput
Infrastructure efficiency
Cost optimization

Observability & Production Monitoring

Build visibility across the complete AI pipeline:
Speech-to-text
Diarization
Retrieval
LLM interactions
Define and monitor AI quality metrics and production SLOs.
Track:
Model quality
Transcription accuracy
Retrieval performance
Latency
Throughput
Cost per user
Build dashboards, monitoring, and alerting systems.
Identify and resolve failures across distributed AI systems using metrics, logs, and traces.
Create feedback loops to continuously improve model performance.

AI Evaluation Frameworks

Build evaluation systems to measure model quality before production release.
Create ground-truth datasets and benchmarking frameworks.
Measure AI performance using metrics such as:
WER (Word Error Rate)
DER (Diarization Error Rate)
Recall
F1 Score
Run regression testing and A/B evaluations for:
Model changes
Prompt updates
Pipeline improvements
Drive decisions using measurable performance data.

Required Qualifications

3–5 years of experience as an AI/ML Engineer building production AI systems.
Strong software engineering skills with experience delivering scalable systems.
Strong programming skills in Python.
Hands-on experience with:
Large Language Models (LLMs)
Agentic AI systems
Vector databases and retrieval systems
Model serving infrastructure
Mandatory experience with:
Self-hosting AI models
LLM gateways
Agent orchestration frameworks
Experience deploying AI systems in production environments.
Experience with cloud infrastructure (GCP preferred).
Experience with containerized deployments using Kubernetes.
Strong understanding of AI observability and evaluation practices.
Ability to use metrics and experimentation to drive AI improvements.

Must Have Skills

Agentic AI development
Self-hosted model deployment
LLM gateway experience
LLM orchestration
Production ML systems
Python
Model serving and optimization
AI evaluation frameworks

Good to Have Skills

Speech AI experience:
Speech-to-text
Speaker diarization
Voice AI
Research background or AI publications.
Experience optimizing open-source AI models.
Experience building AI monitoring platforms.
Experience with production model evaluation pipelines.

Preferred Mindset

Strong first-principles problem solving.
Ability to measure before making decisions.
Focus on production reliability over experimentation only.
Ownership mindset for building AI systems from development to production.