Khmer_ASR_CodeSwitchingd

Khmer_ASR_CodeSwitching is an automatic speech recognition (ASR) model designed to transcribe Khmer speech with English code-switching. In real-world Cambodian conversations, speakers often mix Khmer and English within the same sentence, which traditional ASR systems fail to handle accurately. This project addresses that gap by enabling robust transcription for bilingual and mixed-language audio.

The model is intended for research, education, and practical applications such as voice assistants, transcription tools, and multilingual AI systems in Cambodia and Southeast Asia.

Model Overview

The Khmer_ASR_CodeSwitching model is built on a deep learning–based ASR pipeline that processes raw audio input and outputs text containing both Khmer and English tokens.

Key characteristics:

  • Supports bilingual transcription (Khmer + English)

  • Optimized for conversational speech

  • Designed for noisy, real-world audio


Architecture

The model follows a standard ASR pipeline:

  1. Audio Preprocessing

    • Input audio is resampled to a fixed sampling rate

    • Noise normalization and feature extraction are applied

  2. Acoustic Model

    • Learns speech-to-text mappings

    • Encodes phonetic patterns from Khmer and English speech

  3. Language Modeling

    • Enables code-switching recognition

    • Maintains correct word boundaries across languages

  4. Decoder

    • Converts model outputs into readable text

    • Produces mixed Khmer–English transcriptions


Code-Switching Support

Code-switching is a core feature of this model. It supports:

  • Khmer sentences with embedded English words

  • English technical terms inside Khmer speech

  • Natural switching without forcing a single language output

Example:

Audio: “ថ្ងៃនេះ meeting មាន importance ខ្លាំង”

Output: “ថ្ងៃនេះ meeting មាន importance ខ្លាំង”

Create a virtual environment

Install required packages

Load the Model

Audio Preprocessing

Model Inference

Last updated