Yes, there are language models that support Khmer (the official language of Cambodia), though dedicated, high-performing ones are still emerging compared to high-resource languages like English.40

Key Options for Khmer LLMs

  • SEA-LION (Southeast Asian Languages in One Network): This is the most prominent open-source family of multilingual LLMs built specifically for Southeast Asian languages, including Khmer. Developed by AI Singapore with partners (including Cambodian collaborations), it covers 11 SEA languages: English, Chinese, Indonesian, Vietnamese, Malay, Thai, Burmese, Lao, Filipino, Tamil, and Khmer.41
    • Models include versions based on Llama, Gemma, Qwen, etc. (e.g., Gemma-SEA-LION-v4-27B, Apertus-SEA-LION).
    • Strong focus on cultural context and low-resource languages.
    • Available on Hugging Face (search “SEA-LION” or “aisingapore”) and at sea-lion.ai.60
    • v4 includes multimodal (vision-language) capabilities and continued improvements for Khmer.
  • SeaLLMs: Another SEA-focused family that explicitly supports Khmer (along with other regional languages). Models like SeaLLM-13B-Chat are optimized for chat in Khmer and similar languages.51

Other Khmer NLP Resources

  • Smaller/fine-tuned models on Hugging Face: Khmer-specific BERT, ALBERT, Whisper (for speech), mT5 (summarization), NLLB (translation), and TTS models. Examples include seanghay/ models for ASR and others for text tasks.50
  • Older efforts: Khmer BERT/ULMFiT models from researchers like Phylypo Tum.
  • Major general LLMs (GPT, Claude, Gemini, etc.): They handle basic Khmer but often struggle with nuance, tokenization (Khmer lacks spaces between words), and cultural context. SEA-LION/SeaLLMs are generally better for native performance.38

Ongoing Development

Cambodia-Singapore partnerships and AI Forum Cambodia are actively working on Khmer LLM improvements (e.g., via SEA-LION). Progress includes better data collection, machine translation as a bridge, and community efforts.5

For the best results today, try SEA-LION models on Hugging Face or their platform. If you’re building something specific (chatbot, translation, ASR), let me know for more targeted recommendations!