A practical guide to Llama Stack for Node.js developers
Explore how to run tools with Node.js using Llama Stack's completions API, agent API, and support for in-line tools, local MCP tools, and remote MCP tools.
Explore how to run tools with Node.js using Llama Stack's completions API, agent API, and support for in-line tools, local MCP tools, and remote MCP tools.
Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.
Sign up for the Red Hat Developer Newsletter
This article demonstrates how to run the Microsoft TRELLIS AI workload using Podman on RHEL to generate 3D assets.
Headed to DevNexus 2025? Visit the Red Hat Developer booth on-site to speak to our expert technologists.
Contact Sales - Red Hat Developer
The Red Hat Developer program brings developers together to learn from each other and create more extraordinary things, faster. We serve the builders. Those who solve problems and create their careers with code.
This article demonstrates how to fine-tune LLMs in a distributed environment with open source tools and the Kubeflow Training Operator on Red Hat OpenShift AI.
Learn how to integrate NVIDIA NIM with OpenShift AI to build, deploy, and monitor AI-enabled applications efficiently within a unified, scalable platform.
Enable hardware-enabled networking for containerized workloads using Red Hat OpenShift, BlueField DPUs, and NVIDIA DOCA Platform Framework.
Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.
Podman AI Lab, which integrates with Podman Desktop, provides everything you need to start developing Node.js applications that leverage large language models.
Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.
Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.
Explore how vLLM's new multimodal AI inference capabilities enhance performance, scalability, and flexibility across diverse hardware platforms.
Learn about an efficient inference scaling method that can improve your model's reasoning ability and performance at runtime while saving on compute costs.
Learn how to securely integrate Microsoft Azure OpenAI Service with Red Hat OpenShift Lightspeed using temporary child credentials.
Discover how the RamaLama open source project can help isolate AI models for testing and experimenting.
Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.
Progress in small LLM reasoning: Our Qwen-32B model, using particle filtering, now surpasses o1-preview on Math500.
Summary of all the Node.js AI posts on the parasol application
On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for Llama/Granite; synthetic data generation shows promise but fine-tuning is tricky.
An update on reproducing R1-like reasoning in small LLMs: Granite models show big gains with particle filtering, outperforming GPT-4o on benchmarks.
Explore how distributed inference works within vLLM in this recap of Neural Magic's vLLM Office Hours with Michael Goin and Murali Andoorveedu, a vLLM committer from CentML.