Research Overview · Entermind AI
Rukun-32B-V (Rukun Ready AI)
A Malaysia-Aligned Structured Policy Validation Model
Abstract
Rukun-32B-V is a 33-billion-parameter large language model fine-tuned with Low-Rank Adaptation (LoRA) on Qwen2.5-32B-Instruct for structured policy validation aligned to Malaysia's national philosophy, Rukun Negara. The model returns strictly schema-conformant JSON containing principle-level status and severity scores across the five Rukun Negara principles, together with an aggregate severity band, a natural-language explanation, derived classification fields, and a policy-aligned rewrite for non-compliant inputs.
Training data comprises 66,516 training and 1,353 validation records assembled from four sub-corpora through a stratified pipeline with deduplication, normalisation, and audit passes. The corpus is multilingual, covering Bahasa Malaysia, English, Mandarin, Tamil, and code-switched Bahasa Rojak. Fine-tuning uses LoRA (r=32, alpha=64) with completion-only masking on 2×B200 GPUs over 8,284 steps, converging to a training loss of 0.2501 and an evaluation loss of 0.2147. On a held-out benchmark (n=50), the model achieves 88.0% accuracy, 83.3% precision, 90.9% recall, and 86.96% F1 on the violating class. Deployed on vLLM/RunPod, it serves at sub-second latencies with deterministic decoding.
Evaluation Results
88.0%
Accuracy
Held-out benchmark (n=50)
83.3%
Precision
Violating class
90.9%
Recall
Violating class
86.96%
F1 Score
Violating class
Training Configuration
66,516
Training Records
Stratified multi-corpus pipeline
1,353
Validation Records
Held-out labeled benchmark
8,284
Training Steps
2 × B200 GPUs
0.2147
Eval Loss
Train loss: 0.2501
LoRA Hyperparameters
Base model
Qwen2.5-32B-Instruct
LoRA rank
r = 32
LoRA alpha
alpha = 64
Masking
Completion-only
Hardware
2 × B200 GPU
Release
Rukun-32B-v1.5
Dataset Composition
Multilingual corpus covering Bahasa Malaysia, English, Mandarin, Tamil, Bahasa Rojak.
Teacher-Core
Primary instruction-response pairs aligned to all five principles
Rewrite-Boost
Non-compliant inputs paired with policy-aligned rewrites
Principle-Boost
Hard examples targeting under-represented principle combinations
Format-Guard
Schema-conformance reinforcement for deterministic JSON output
Output Schema
Strictly schema-conformant JSON. Every field is deterministically populated on each inference call.
| Field | Type | Description |
|---|---|---|
| principles | array | Per-principle status and severity across all five Rukun Negara |
| severityBand | string | "safe" | "caution" | "violation" |
| violationCount | number | Derived aggregate from principle-level results |
| severityScore | float | Normalised 0.00–1.00 composite score |
| isProblematic | boolean | Deterministic flag for downstream routing |
| explanation | string | Natural-language rationale for classification |
| rewrite | string | null | Policy-aligned rewrite for non-compliant inputs only |
Rukun Negara: Five Principles
Belief in God
Kepercayaan kepada Tuhan
Loyalty to King and Country
Kesetiaan kepada Raja dan Negara
Upholding the Constitution
Keluhuran Perlembagaan
Rule of Law
Kedaulatan Undang-Undang
Good Behaviour and Morality
Kesopanan dan Kesusilaan
Deployment
The model is deployed on vLLM/RunPod with deterministic decoding (temperature = 0) to guarantee schema-conformant JSON on every call. Sub-second latency makes it viable as a real-time moderation layer in production pipelines. Publicly released as EntermindAI/Rukun-32B-V on HuggingFace.
Research output · Entermind AI · 2025 · Content available for academic reference
Background