About Humyn Labs
At Humyn Labs, we believe the best AI is built on the best human judgment. We operate a global network of 1M+ verified experts who deliver high-quality, multimodal training datasets across domains — backed by reputation verification and multi-layer quality control.
Humyn Labs converts human action — across sound, sight, movement, and touch — into high-quality multi-modal data signals for physical AI. Operating across 20+ countries in India, southeast Asia, Latin America, and the Middle East: the real-world environments where physical AI deploys, not the labs where it is built.
Our data isn't just collected; it's evaluated, defended, and production-ready. Because before AI can be trusted, its training data must be.
Our work sits at the intersection of egocentric video understanding, embodied AI, robotics perception, and voice-driven interaction. We move fast, obsess over data quality, and ship at scale.
Role Overview
We are building structured, high-quality voice datasets for frontier AI companies working on speech-to-text, speech-to-speech, and multimodal AI systems.
We are looking for a Voice AI Engineer — someone who can design and implement production-grade pipelines for dataset construction, model integration, and automated evaluation across ASR/TTS systems. This role translates research insights into scalable engineering systems that power our data quality and benchmarking infrastructure.
This role sits at the intersection of audio engineering, ML infrastructure, and multilingual AI. If you love building systems that make model evaluation faster, more reproducible, and more impactful — this is built for you
What You Will Work On
ASR/TTS Pipeline Engineering
- Design and build automated evaluation pipelines integrating ASR/TTS models (Whisper, Deepgram, Google STT, Azure Speech, open-source alternatives)
- Implement WER, CER, MOS, SNR, and latency measurement modules within a unified evaluation framework
- Engineer robust audio preprocessing utilities: noise reduction, format normalisation, channel splitting, and segmentation
- Build dataset ingestion and transformation pipelines supporting diverse audio sources and annotation formats
Benchmarking Infrastructure
- Develop a modular, extensible benchmarking framework enabling consistent cross-model comparisons
- Automate experiment orchestration so new model versions can be evaluated with a single configuration change
- Instrument pipelines with structured logging, reproducibility controls, and experiment tracking (e.g. MLflow, W&B)
- Build dashboards or structured reports surfacing model performance shifts across language, accent, and noise conditions
Multilingual & Indic Language Support
- Engineer evaluation tooling optimised for Indic languages and dialects (Hindi, Tamil, Telugu, Bengali, Kannada, and others)
- Handle code-switching, transliteration, and script-level normalisation in text processing pipelines
- Implement specialised evaluation logic for low-resource and dialect-heavy audio segments
Dataset Quality Engineering
- Build automated scoring systems measuring audio clarity, speaker diversity, annotation accuracy, and accent/dialect coverage
- Develop supplier data validation pipelines with objective quality signals and tagging workflows
- Engineer data versioning and lineage tracking so dataset iterations are fully auditable
You Must Have
- 3–6 years of engineering experience in speech AI, audio ML, NLP pipelines, or ML infrastructure
- Strong hands-on experience building and deploying ASR/TTS integrations in Python
- Proven ability to design evaluation frameworks and automated experiment pipelines
- Comfort working with audio data at scale — FFmpeg, librosa, torchaudio, soundfile
- Experience integrating commercial and open-source speech APIs
- Genuine interest in linguistic diversity, especially Indic languages and low-resource scenarios
- Strong written communication skills — ability to document systems clearly for technical and non-technical stakeholders
Technical Skills
- Python — audio processing, API integration, data pipelines
- PyTorch or TensorFlow for model inference and fine-tuning workflows
- Whisper, SpeechBrain, Kaldi, or equivalent ASR toolkits
- WER / CER / MOS / SNR metrics implementation and interpretation
- Docker, CI/CD, and reproducible ML environment management
Ideal Mindset
- Systems thinker — you design for scale from day one, not as an afterthought
- Obsessed with reproducibility — every experiment should be re-runnable with a single command
- Curious about model failure modes and motivated to build tools that expose them rigorously
- Collaborative and documentation-driven — your code and systems are readable by the whole team
- Excited to work at the frontier of multilingual speech AI, especially for underrepresented languages
