Khmer_ASR_CodeSwitchingd
Khmer_ASR_CodeSwitching is an automatic speech recognition (ASR) model designed to transcribe Khmer speech with English code-switching. In real-world Cambodian conversations, speakers often mix Khmer and English within the same sentence, which traditional ASR systems fail to handle accurately. This project addresses that gap by enabling robust transcription for bilingual and mixed-language audio.
The model is intended for research, education, and practical applications such as voice assistants, transcription tools, and multilingual AI systems in Cambodia and Southeast Asia.
Model Overview
The Khmer_ASR_CodeSwitching model is built on a deep learning–based ASR pipeline that processes raw audio input and outputs text containing both Khmer and English tokens.
Key characteristics:
Supports bilingual transcription (Khmer + English)
Optimized for conversational speech
Designed for noisy, real-world audio
Architecture
The model follows a standard ASR pipeline:
Audio Preprocessing
Input audio is resampled to a fixed sampling rate
Noise normalization and feature extraction are applied
Acoustic Model
Learns speech-to-text mappings
Encodes phonetic patterns from Khmer and English speech
Language Modeling
Enables code-switching recognition
Maintains correct word boundaries across languages
Decoder
Converts model outputs into readable text
Produces mixed Khmer–English transcriptions
Code-Switching Support
Code-switching is a core feature of this model. It supports:
Khmer sentences with embedded English words
English technical terms inside Khmer speech
Natural switching without forcing a single language output
Example:
Audio: “ថ្ងៃនេះ meeting មាន importance ខ្លាំង”
Output: “ថ្ងៃនេះ meeting មាន importance ខ្លាំង”
Create a virtual environment
Install required packages
Load the Model
Audio Preprocessing
Model Inference
Last updated