Research Overview · Entermind AI

Rukun-32B-V (Rukun Ready AI)

A Malaysia-Aligned Structured Policy Validation Model

Ibn Zaman Fahad·Entermind AI·2025

Model Card · HuggingFace Product · RukunNegara.ai

Abstract

Rukun-32B-V is a 33-billion-parameter large language model fine-tuned with Low-Rank Adaptation (LoRA) on Qwen2.5-32B-Instruct for structured policy validation aligned to Malaysia's national philosophy, Rukun Negara. The model returns strictly schema-conformant JSON containing principle-level status and severity scores across the five Rukun Negara principles, together with an aggregate severity band, a natural-language explanation, derived classification fields, and a policy-aligned rewrite for non-compliant inputs.

Training data comprises 66,516 training and 1,353 validation records assembled from four sub-corpora through a stratified pipeline with deduplication, normalisation, and audit passes. The corpus is multilingual, covering Bahasa Malaysia, English, Mandarin, Tamil, and code-switched Bahasa Rojak. Fine-tuning uses LoRA (r=32, alpha=64) with completion-only masking on 2×B200 GPUs over 8,284 steps, converging to a training loss of 0.2501 and an evaluation loss of 0.2147. On a held-out benchmark (n=50), the model achieves 88.0% accuracy, 83.3% precision, 90.9% recall, and 86.96% F1 on the violating class. Deployed on vLLM/RunPod, it serves at sub-second latencies with deterministic decoding.

Evaluation Results

88.0%

Accuracy

Held-out benchmark (n=50)

83.3%

Precision

Violating class

90.9%

Recall

Violating class

86.96%

F1 Score

Violating class

Training Configuration

66,516

Training Records

Stratified multi-corpus pipeline

1,353

Validation Records

Held-out labeled benchmark

8,284

Training Steps

2 × B200 GPUs

0.2147

Eval Loss

Train loss: 0.2501

LoRA Hyperparameters

Base model

Qwen2.5-32B-Instruct

LoRA rank

r = 32

LoRA alpha

alpha = 64

Masking

Completion-only

Hardware

2 × B200 GPU

Release

Rukun-32B-v1.5

Dataset Composition

Multilingual corpus covering Bahasa Malaysia, English, Mandarin, Tamil, Bahasa Rojak.

Teacher-Core

Primary instruction-response pairs aligned to all five principles

Rewrite-Boost

Non-compliant inputs paired with policy-aligned rewrites

Principle-Boost

Hard examples targeting under-represented principle combinations

Format-Guard

Schema-conformance reinforcement for deterministic JSON output

Output Schema

Strictly schema-conformant JSON. Every field is deterministically populated on each inference call.

Field	Type	Description
principles	array	Per-principle status and severity across all five Rukun Negara
severityBand	string	"safe" \| "caution" \| "violation"
violationCount	number	Derived aggregate from principle-level results
severityScore	float	Normalised 0.00–1.00 composite score
isProblematic	boolean	Deterministic flag for downstream routing
explanation	string	Natural-language rationale for classification
rewrite	string \| null	Policy-aligned rewrite for non-compliant inputs only

Rukun Negara: Five Principles

Belief in God

Kepercayaan kepada Tuhan

Loyalty to King and Country

Kesetiaan kepada Raja dan Negara

Upholding the Constitution

Keluhuran Perlembagaan

Rule of Law

Kedaulatan Undang-Undang

Good Behaviour and Morality

Kesopanan dan Kesusilaan

Deployment

The model is deployed on vLLM/RunPod with deterministic decoding (temperature = 0) to guarantee schema-conformant JSON on every call. Sub-second latency makes it viable as a real-time moderation layer in production pipelines. Publicly released as EntermindAI/Rukun-32B-V on HuggingFace.

Research output · Entermind AI · 2025 · Content available for academic reference

Background

Abstract

Training Configuration

66,516

Training Records

Stratified multi-corpus pipeline

1,353

Validation Records

Held-out labeled benchmark

8,284

Training Steps

2 × B200 GPUs

0.2147

Eval Loss

Train loss: 0.2501

LoRA Hyperparameters

Base model

Qwen2.5-32B-Instruct

LoRA rank

r = 32

LoRA alpha

alpha = 64

Masking

Completion-only

Hardware

2 × B200 GPU

Release

Rukun-32B-v1.5

Dataset Composition

Multilingual corpus covering Bahasa Malaysia, English, Mandarin, Tamil, Bahasa Rojak.

Teacher-Core

Primary instruction-response pairs aligned to all five principles

Rewrite-Boost

Non-compliant inputs paired with policy-aligned rewrites

Principle-Boost

Hard examples targeting under-represented principle combinations

Format-Guard

Schema-conformance reinforcement for deterministic JSON output

Output Schema

Strictly schema-conformant JSON. Every field is deterministically populated on each inference call.

Field	Type	Description
principles	array	Per-principle status and severity across all five Rukun Negara
severityBand	string	"safe" \| "caution" \| "violation"
violationCount	number	Derived aggregate from principle-level results
severityScore	float	Normalised 0.00–1.00 composite score
isProblematic	boolean	Deterministic flag for downstream routing
explanation	string	Natural-language rationale for classification
rewrite	string \| null	Policy-aligned rewrite for non-compliant inputs only