← Back to AI Engineer Jobs
Humyn Labs logo

Voice AI Engineer

Humyn Labs

🇺🇸San Francisco, US

About Humyn Labs

At Humyn Labs, we believe the best AI is built on the best human judgment. We operate a global network of 1M+ verified experts who deliver high-quality, multimodal training datasets across domains — backed by reputation verification and multi-layer quality control.

Humyn Labs converts human action — across sound, sight, movement, and touch — into high-quality multi-modal data signals for physical AI. Operating across 20+ countries in India, southeast Asia, Latin America, and the Middle East: the real-world environments where physical AI deploys, not the labs where it is built.

Our data isn't just collected; it's evaluated, defended, and production-ready. Because before AI can be trusted, its training data must be.

Our work sits at the intersection of egocentric video understanding, embodied AI, robotics perception, and voice-driven interaction. We move fast, obsess over data quality, and ship at scale.

Role Overview

We are building structured, high-quality voice datasets for frontier AI companies working on speech-to-text, speech-to-speech, and multimodal AI systems.

We are looking for a Voice AI Engineer — someone who can design and implement production-grade pipelines for dataset construction, model integration, and automated evaluation across ASR/TTS systems. This role translates research insights into scalable engineering systems that power our data quality and benchmarking infrastructure.

This role sits at the intersection of audio engineering, ML infrastructure, and multilingual AI. If you love building systems that make model evaluation faster, more reproducible, and more impactful — this is built for you

What You Will Work On

ASR/TTS Pipeline Engineering

  • Design and build automated evaluation pipelines integrating ASR/TTS models (Whisper, Deepgram, Google STT, Azure Speech, open-source alternatives)
  • Implement WER, CER, MOS, SNR, and latency measurement modules within a unified evaluation framework
  • Engineer robust audio preprocessing utilities: noise reduction, format normalisation, channel splitting, and segmentation
  • Build dataset ingestion and transformation pipelines supporting diverse audio sources and annotation formats

Benchmarking Infrastructure

  • Develop a modular, extensible benchmarking framework enabling consistent cross-model comparisons
  • Automate experiment orchestration so new model versions can be evaluated with a single configuration change
  • Instrument pipelines with structured logging, reproducibility controls, and experiment tracking (e.g. MLflow, W&B)
  • Build dashboards or structured reports surfacing model performance shifts across language, accent, and noise conditions

Multilingual & Indic Language Support

  • Engineer evaluation tooling optimised for Indic languages and dialects (Hindi, Tamil, Telugu, Bengali, Kannada, and others)
  • Handle code-switching, transliteration, and script-level normalisation in text processing pipelines
  • Implement specialised evaluation logic for low-resource and dialect-heavy audio segments

Dataset Quality Engineering

  • Build automated scoring systems measuring audio clarity, speaker diversity, annotation accuracy, and accent/dialect coverage
  • Develop supplier data validation pipelines with objective quality signals and tagging workflows
  • Engineer data versioning and lineage tracking so dataset iterations are fully auditable

You Must Have

  • 3–6 years of engineering experience in speech AI, audio ML, NLP pipelines, or ML infrastructure
  • Strong hands-on experience building and deploying ASR/TTS integrations in Python
  • Proven ability to design evaluation frameworks and automated experiment pipelines
  • Comfort working with audio data at scale — FFmpeg, librosa, torchaudio, soundfile
  • Experience integrating commercial and open-source speech APIs
  • Genuine interest in linguistic diversity, especially Indic languages and low-resource scenarios
  • Strong written communication skills — ability to document systems clearly for technical and non-technical stakeholders

Technical Skills

  • Python — audio processing, API integration, data pipelines
  • PyTorch or TensorFlow for model inference and fine-tuning workflows
  • Whisper, SpeechBrain, Kaldi, or equivalent ASR toolkits
  • WER / CER / MOS / SNR metrics implementation and interpretation
  • Docker, CI/CD, and reproducible ML environment management

Ideal Mindset

  • Systems thinker — you design for scale from day one, not as an afterthought
  • Obsessed with reproducibility — every experiment should be re-runnable with a single command
  • Curious about model failure modes and motivated to build tools that expose them rigorously
  • Collaborative and documentation-driven — your code and systems are readable by the whole team
  • Excited to work at the frontier of multilingual speech AI, especially for underrepresented languages
Apply on linkedinVisit company →

More ai engineer jobs roles

  • Artificial Intelligence EngineerBullet Microdrama OTT · New Delhi, Delhi, India→
  • AI EngineerOrion Innovation · Iselin, US→
  • Founding AI Engineer Intern at FrancesJack & Jill · San Francisco, CA→
  • Senior AI EngineerSOFICO · Zwijnaarde, BE→
  • AI ENGINEER / AI EXPERTAmpstek · Princeton, US→
  • AI EngineerLeonar · Paris, FR→
  • Artificial Intelligence EngineerAdroit People Limited (UK) · Uxbridge, GB→
  • AI EngineerLuxoft · Zug, CH→
View all ai engineer jobs roles →

Don't miss the next ai engineer jobs role

Set up an alert and we'll email you matching openings. No spam, unsubscribe anytime.

Double opt-in: we'll email you a link to confirm. No spam, unsubscribe anytime.