Unsloth and Training Hub: Lightning-fast LoRA and QLoRA fine-tuning

Fine-tuning large language models in enterprise environments requires both algorithmic innovation and production-grade execution. Even the most advanced post-training methods fall short if teams struggle with distributed systems, infrastructure complexity, or framework glue code. Training Hub bridges that gap by turning cutting-edge training techniques into scalable, enterprise-ready workflows.

What is Training Hub

Training Hub is an open source, algorithm-centered Python library for LLM post-training, maintained by Red Hat's AI Innovation team. Rather than forcing teams to discover, learn, and wire together a growing collection of independent training libraries, each with its own API, setup, and inevitable quirks, Training Hub provides a single interface with each post-training algorithm exposed as a simple Python function. Each algorithm maps to a production-tested backend implementation. The interface is constant across all of them, so you get the same common arguments for model path, data, learning rate, and distributed setup. Switching algorithms never means rewriting your pipeline. You focus on what training algorithm you want, not how it runs.

Training Hub is available as a stand-alone open source project on PyPI and GitHub. It is also included as part of Red Hat AI, where enterprise customers can access a fully supported build through the Red Hat Python Index.

What's new: LoRA and QLoRA using Unsloth

An LLM is a complex model made up of, sometimes, billions of parameters. Unlike full supervised fine-tuning (SFT), which updates every parameter in a model, low-ranking adaptation (LoRA) instead works by training a small set of lightweight "adapter" weights, leaving the base model untouched. Quantized low-ranking adaptation (QLoRA) takes this further by quantizing the base model to 4-bit precision during training, slashing memory requirements even more. The result is fast, cost-effective model adaptation that runs on hardware most teams already have.

As of v0.4.0, Training Hub supports both LoRA and QLoRA fine-tuning, powered by the Unsloth backend. This is a significant addition for teams that need to adapt models quickly and cheaply without the compute requirements of full fine-tuning.

With Unsloth as the backend, performance requires roughly 70% less VRAM than full fine-tuning and approximately 2 times faster training than standard LoRA pipelines. You can fine-tune 7B models on a single 24GB GPU using 4-bit NF4 quantization, or scale up with multi-GPU support for both data and model parallelism. All production-tested, with full support available through Red Hat AI.

What this looks like in practice

Unsloth provides kernel-level CUDA optimizations, memory-efficient attention, gradient checkpointing, and mixed-precision training. In Training Hub, all of that collapses into:


from training_hub import lora_sft

result = lora_sft(
    model_path="Qwen/Qwen2.5-7B-Instruct",
    data_path="./training_data.jsonl",
    ckpt_output_dir="./outputs",

    lora_r=16,
    lora_alpha=32,
    load_in_4bit=True,

    num_epochs=3,
    learning_rate=2e-4,
    micro_batch_size=8,
    gradient_accumulation_steps=4
)

We've removed the burden of manually navigating the intricate wiring of upstream backends by abstracting those complexities entirely. Instead of forcing you to manage low-level connections such as quantization boilerplate, we provide a unified interface that mirrors the rest of the Training Hub ecosystem. This allows you to trigger sophisticated workflows with a single function call, keeping your implementation clean, consistent, and free from the usual infrastructure overhead.

When to use LoRA or QLoRA vs. OSFT and SFT

LoRA is the right choice when GPU memory is constrained, when you need fast experimentation cycles, or when you want to maintain multiple task-specific adapters without the cost of full training runs. It gets you most of the way at a fraction of the compute.

When LoRA isn't enough, Training Hub gives you a path forward without switching tools. Full SFT delivers maximum task performance when you have the compute to support it. And OSFT, an orthogonal subspace fine-tuning method developed by Red Hat's AI Innovation team and now part of Hugging Face PEFT, is designed for continual learning: teaching a model new capabilities while strictly preserving its existing performance. Each method maps to its own production-tested backend, and the interface remains consistent across all of them.

Backend support without lock-in

Training Hub is not a new training framework, it's a routing layer to the right backend for the job. Today it supports:

Unsloth for LoRA / QLoRA
InstructLab-Training for SFT
RHAI Mini-Trainer for OSFT / continual learning

All backends are production-tested. For Red Hat AI customers, they are fully supported and integrated with OpenShift AI.

Why this matters for data scientists

Training Hub is built around a simple idea: LLM training should be algorithm-driven rather than framework-driven. Select the right training algorithm — LoRA, SFT, or OSFT — based on your adaptation goal, not your infrastructure constraints. By utilizing production-tested backends without coupling workflows to a single framework, teams can compose multi-stage training pipelines as their models and data evolve. Your training logic remains stable even as the underlying infrastructure, scale, and backends continue to change.

Get started

Training Hub remains your algorithm-focused entry point for LLM fine-tuning on Red Hat AI, now with production-grade LoRA/QLoRA support with Unsloth. To learn more, check out these resources:

Documentation
Source code and examples
LoRA/QLoRA runnable Unsloth + QLoRA example in the repo

Unsloth and Training Hub: Lightning-fast LoRA and QLoRA fine-tuning