About Humyn Labs

At Humyn Labs, we believe the best AI is built on the best human judgment. We operate a global network of 1M+ verified experts who deliver high-quality, multimodal training datasets across domains — backed by reputation verification and multi-layer quality control.

Humyn Labs converts human action — across sound, sight, movement, and touch — into high-quality multi-modal data signals for physical AI. Operating across 20+ countries in India, southeast Asia, Latin America, and the Middle East: the real-world environments where physical AI deploys, not the labs where it is built.

Our data isn't just collected; it's evaluated, defended, and production-ready. Because before AI can be trusted, its training data must be.

Our work sits at the intersection of egocentric video understanding, embodied AI, robotics perception, and voice-driven interaction. We move fast, obsess over data quality, and ship at scale.

Role Overview

We are building structured, high-quality voice datasets for frontier AI companies working on speech-to-text, speech-to-speech, and multimodal AI systems.

We are looking for a Voice AI Engineer — someone who can design and implement production-grade pipelines for dataset construction, model integration, and automated evaluation across ASR/TTS systems. This role translates research insights into scalable engineering systems that power our data quality and benchmarking infrastructure.

This role sits at the intersection of audio engineering, ML infrastructure, and multilingual AI. If you love building systems that make model evaluation faster, more reproducible, and more impactful — this is built for you

What You Will Work On

ASR/TTS Pipeline Engineering

Design and build automated evaluation pipelines integrating ASR/TTS models (Whisper, Deepgram, Google STT, Azure Speech, open-source alternatives)
Implement WER, CER, MOS, SNR, and latency measurement modules within a unified evaluation framework
Engineer robust audio preprocessing utilities: noise reduction, format normalisation, channel splitting, and segmentation
Build dataset ingestion and transformation pipelines supporting diverse audio sources and annotation formats

Benchmarking Infrastructure

Develop a modular, extensible benchmarking framework enabling consistent cross-model comparisons
Automate experiment orchestration so new model versions can be evaluated with a single configuration change
Instrument pipelines with structured logging, reproducibility controls, and experiment tracking (e.g. MLflow, W&B)
Build dashboards or structured reports surfacing model performance shifts across language, accent, and noise conditions

Multilingual & Indic Language Support

Engineer evaluation tooling optimised for Indic languages and dialects (Hindi, Tamil, Telugu, Bengali, Kannada, and others)
Handle code-switching, transliteration, and script-level normalisation in text processing pipelines
Implement specialised evaluation logic for low-resource and dialect-heavy audio segments

Dataset Quality Engineering

Build automated scoring systems measuring audio clarity, speaker diversity, annotation accuracy, and accent/dialect coverage
Develop supplier data validation pipelines with objective quality signals and tagging workflows
Engineer data versioning and lineage tracking so dataset iterations are fully auditable

You Must Have

3–6 years of engineering experience in speech AI, audio ML, NLP pipelines, or ML infrastructure
Strong hands-on experience building and deploying ASR/TTS integrations in Python
Proven ability to design evaluation frameworks and automated experiment pipelines
Comfort working with audio data at scale — FFmpeg, librosa, torchaudio, soundfile
Experience integrating commercial and open-source speech APIs
Genuine interest in linguistic diversity, especially Indic languages and low-resource scenarios
Strong written communication skills — ability to document systems clearly for technical and non-technical stakeholders

Technical Skills

Python — audio processing, API integration, data pipelines
PyTorch or TensorFlow for model inference and fine-tuning workflows
Whisper, SpeechBrain, Kaldi, or equivalent ASR toolkits
WER / CER / MOS / SNR metrics implementation and interpretation
Docker, CI/CD, and reproducible ML environment management

Ideal Mindset

Systems thinker — you design for scale from day one, not as an afterthought
Obsessed with reproducibility — every experiment should be re-runnable with a single command
Curious about model failure modes and motivated to build tools that expose them rigorously
Collaborative and documentation-driven — your code and systems are readable by the whole team
Excited to work at the frontier of multilingual speech AI, especially for underrepresented languages