Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

The state of open source AI models in 2025

January 7, 2026
Cedric Clyburn
Related topics:
Artificial intelligenceOpen source
Related products:
Red Hat AI InferenceRed Hat AIRed Hat OpenShift

    2025 was an exciting year for AI hobbyists running large language models (LLMs) on their own hardware and organizations that need on-premises and sovereign AI. These use cases require open models you can download locally from a public registry like Hugging Face. You can then run them on inference engines such as Ollama or RamaLama (for simple deployments) or production-ready inference servers such as vLLM.

    As we help developers deploy these models for customer service and knowledge management (using patterns like retrieval-augmented generation) or code assistance (through agentic AI), we see a trend toward specific models for specific use cases. Let's look at which models are used most in real-world applications and how you can start using them.

    Leading to 2025: The pre-DeepSeek landscape

    Before DeepSeek gained popularity at the beginning of 2025, the open model ecosystem was simpler (Figure 1). Meta's Llama family of models was quite dominant, and these dense models (ranging from 7 to 405 billion parameters) were easy to deploy or customize. Mistral was also competing (certainly in the EU market), but models from Asia, such as DeepSeek (with its V3) or Qwen were not yet popular.

    Timeline of open model releases from 2023 to 2025, featuring Llama, Mistral, and IBM Granite, concluding with the OpenAI gpt-oss release in August 2025.
    Figure 1: A brief recap of the open model ecosystem for 2025 and prior years.

    Through the stock market effect and media attention, DeepSeek's reasoning model validated that open weights can deliver high-value reasoning. It showed that open models are capable options for teams that need cost control or air-gapped deployments. In fact, many of the models I'll discuss here come from Chinese labs and lead in total downloads per region. As per The ATOM Project, total model downloads switched from USA-dominant to China-dominant during the summer of 2025.

    The highest-performing open models

    Benchmarks show a model's capabilities on certain predefined tasks, but you can also measure capabilities through the LMArena. This crowdsourced AI evaluation platform lets users vote for a result from two models through a "battle." Figure 2 shows what this leaderboard looks like.

    Leaderboard table from LMArena ranking AI models like Gemini-3-pro and Grok-4.1-thinking across categories including coding, math, and creative writing.
    Figure 2: The LMArena leaderboard aggregates user votes into an interactive dashboard to understand model capabilities across writing, long queries, and more.

    After filtering out the proprietary models such as Gemini, Claude, and ChatGPT, we're left with a few contenders. These include Kimi K2 from the Moonshot lab, Qwen3 from the Alibaba team, and of course, DeepSeek. This is quite interesting, as most folks know DeepSeek, but they might not be familiar with the others.

    Qwen, Llama, and Gemma

    Different AI use cases require different model sizes and capabilities, which is why open models are so useful. Instead of a general-purpose, one-size-fits-all scenario, model families such as Qwen offer various model sizes (ranging from as small as .5 B) and modalities (text or vision). The Qwen team maintains a transparent strategy for documentation and deployment instructions on GitHub and is active on X (formerly Twitter) to tease upcoming releases (Figure 4).

    X post by Nathan Lambert: Airbnb CEO Brian Chesky notes they use Alibaba Qwen in production because it is faster and cheaper than OpenAI’s models.
    Figure 4: A testimonial of how Qwen, thanks to their active presence and transparency online, are being used by the largest organizations for their AI strategy.

    Llama and Gemma offer similar "families" of models, but the Qwen ecosystem has seen impressive adoption. While they might not have the highest benchmarks, their commitment to the open model community makes them one of the most used local models available. Farther down on The ATOM Project webpage shows how the Qwen family of models have become the most used through metrics of cumulative downloads as 2025 closes out.

    Frontier models for RAG, agents, and AI-assisted coding

    While labs like Qwen build models for specific use cases, other frontier labs are building capable models that perform like proprietary models (think ChatGPT or Gemini) at a fraction of the cost. A fair way to understand their capabilities, speed, and price is through Artificial Analysis, which incorporates evaluations (like MMLU-Pro and LiveCodeBench to compare all models, both proprietary and open (Figure 5).

    Comparison charts showing Gemini 3 Pro as the most intelligent model, while gpt-oss-120B is the fastest and most cost-effective option available.
    Figure 5: The Artificial Analysis dashboard combines intelligence, but also speed and price for models to help understand a model's strengths and weaknesses.

    Let's look at the two with the highest intelligence score: Kimi K2 from Moonshot AI and gpt-oss from OpenAI.

    Kimi: For tool calling and AI-assisted coding

    Kimi K2 is one of the largest open models in terms of total parameters (about 1 trillion). It is designed with only roughly 32 billion active parameters per token to provide a smaller runtime footprint that can run on NVIDIA A100s, an H100, or even an A6000 (at 48 GB of VRAM if using 4-bit quantization). It performs quite well with agentic workflows, where you might need an AI assistant to search data, analyze trends and patterns, summarize, and generate a report. See Figure 6.

    Grouped bar charts comparing Kimi K2 and GPT-5 Thinking models. Kimi K2 leads in expert reasoning and agentic web search, achieving a state-of-the-art 44.9% on Humanity’s Last Exam.
    Figure 6: Kimi models meet or surpass proprietary and paid models in certain benchmarks.

    Additionally, the "thinking" variant has a context window of up to 256,000 tokens. This is helpful in "vibe" or "spec" coding where you need to generate code, tests, and integrate Model Context Protocol (MCP) servers for additional capabilities.

    OpenAI's gpt-oss: A (surprise) high performing open model

    Because OpenAI is the de facto brand name for AI, its release of an open model a surprise. The gpt-oss model matches the performance of slightly older ChatGPT models. After a bit of a rough launch (due to a "harmony" chat template breaking tools), this model is now known for accurate tool use. Its 120b variant fits on a single 80 GB GPU (like an H100), and the 20b version fits on consumer hardware. It also provides a solid alternative to Qwen for organizations that are still evaluating their model decisions (Figure 7).

    X post by Sam Altman: OpenAI releases gpt-oss, an open-weights reasoning model with performance similar to o4-mini, designed for local use on PCs and phones.
    Figure 7: OpenAI's Sam Altman promoting the model on X (formerly Twitter) due to its strong performance for model size.

    Small models for consumer devices and the edge

    Perhaps the biggest win for AI in 2025 has been the advancement of small language models (SLMs) that can run on almost any consumer device, including mobile phones. Small models are improving faster than most people realize (Figure 8). Although parameter counts might not be changing, their capabilities are increasing. This is the result of improved attention kernels, efficient block layouts, and synthetic data generation techniques that were not available two years ago.

    Scatter plot showing IBM Granite models outperforming larger SLMs. Granite 4.0 1B reaches nearly 70% accuracy, while 300M models surpass competing 1B parameter models.
    Figure 8: An example of model improvements in SLMs, which typically couldn't perform this well in years past.

    For example, the Granite 4 from IBM focused on edge and on-device deployments. It is even ISO 42001 certified for responsible development and governance. Models from Qwen, Gemma (Google), and Llama are also part of this adoption. They provide small models for developers that need air-gapped inference with predictable costs and no API key requirement.

    Real-world use cases for open models

    While smaller models helped the majority of people to run and experiment their own AI, open models are also used in the enterprise (Figure 9). We see this especially in highly regulated sectors (like telecommunications or banking) that have a strict requirement for on-premise deployment and data sovereignty. For example, due to data residency regulations, the usage of AI needs to stay local, so open models are a requirement.

    Stacked diagram of an integrated AI platform. It shows capabilities like MLOps and resource management running on container engines across physical, cloud, and edge environments.
    Figure 9: AI deployment in the enterprise requires not just inferencing a model, but many other capabilities to monitor, automate, and scale AI workloads.

    In areas ranging from customer service automation (call centers and chatbots) to internal knowledge management (legal and document processing), we see a combination of tools. Teams use data processing tools like Docling along with a "smaller" language model like Llama 4 Scout, DeepSeek R1, or Llama 3 to process and respond to requests. Retrieval-augmented generation (RAG) workflows become important here (Figure 10), as these AI pipelines typically need unstructured data for accurate responses.

    Flowchart of a RAG pipeline: a user question triggers data retrieval from a vector database; relevant text is then combined with a prompt and fed into an LLM to generate an answer.
    Figure 10: RAG, or retrieval-augmented generation, remains the top way language models are customized to meet customer needs.

    How to run these models on your own hardware

    You can easily test and evaluate these models. To run locally on your own device (using a GPU or even just a CPU), use Ollama and RamaLama. These command-line interface (CLI) tools can pull and run a model with a single command. These projects both use the llama.cpp inference engine but provide a simple CLI to get you started.

    Beyond just chatting, you can also replace run with serve to get a OpenAI-compatible API for your applications instead of a remote endpoint.

    ollama run gpt-oss

    If you're using Docker or Podman for containerized applications, RamaLama is a great option. It runs models in containers for enhanced isolation and security.

    ramalama run gpt-oss

    vLLM is suited for large-scale inference with concurrent users and repeated requests where caching is useful. A UC Berkeley project now with Red Hat as the main corporate contributor, it supports "any model, on any accelerator, on any cloud." It's also the top open source project by contributors for GitHub in 2025 (Figure 11).

    Split graphic for vLLM. Left: Line graph showing GitHub stars growing from 0 to over 60,000 by late 2025. Right: Technical specs highlighting over 1,700 contributors and broad hardware support.
    Figure 11: For scaling up model deployments, vLLM provides extensive model and hardware support across accelerators and model families.

    Through a full application platform like Red Hat OpenShift and OpenShift AI, your containerized applications run alongside open models. This provides the observability, guardrails, and more that are needed for enterprise AI deployments.

    Wrapping up

    The power of open has provided a huge ecosystem of models for right-sized use cases and inference capabilities that power everything from Raspberry Pis to distributed Kubernetes environments. You've learned about models like Qwen, DeepSeek, gpt-oss, and platforms like LMArena and Artificial Analysis that make it easy to select the right one. Now, it's time for you to test them out, see what works for you, and control your AI narrative.

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    What’s up next?

    Read Applied AI for Enterprise Java Development, a practical guide for Java developers to integrate generative AI and machine learning using familiar enterprise tools.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.