A quick look at large language models with Node.js, Podman Desktop, and the Granite model

We’ve been learning a lot about large language models (LLMs) and how they can be used with Node.js and JavaScript. If you want to follow what we’ve learned, you can check out this learning path that we put together on our journey so far: How to get started with large language models and Node.js.

In our initial learning we followed the most common path by using one of the most popular models from Hugging Face and using tools like ollama and node-llama-cpp to run the models. However, not all models are created equal, and there are a number of aspects that should be considered when choosing a model. The article "Why trust open source AI" is a good introduction to a number of them.

In that context we wanted to try out the Granite model and see what if any differences there were from using the models we’d used previously. We also thought it would be a good opportunity to explore Podman AI Lab as another way to run models.

Getting the Granite large language model running with Podman

We started by installing Podman Desktop. Podman Desktop provides a nice GUI where you can start and manage containers. It supports Windows, macOS, and Linux. Downloads are available from podman-desktop.io/downloads.

Once Podman Desktop was installed we went to the extensions pages, searched for “Podman AI,” and installed the Podman AI Lab extension as shown in Figure 1.

Picture of the podman Extensions page, with the search criteria set to "podman ai" and the tile for Podman AI Lab being displayed — Figure 1: Installing Podman AI Lab extension.

Once the AI Lab extension was installed we could see a new icon in the bar on the left as shown in Figure 2.

Picutre of podman application with a circle around the icon for Podman AI Lab, and the Podmain AI lab main page being displayed — Figure 2: New Podman AI Lab icon.

From that page we can go to the Catalog page. The Catalog page allows us to download the Granite model along with many other popular options, as shown in Figure 3.

Picture of podman AI lab extension Catalog page which shows the models available to be downloaded and their sizes. — Figure 3: Podman AI Lab catalog page with the models available for download.

We used the download option on the far right hand side to download the Granite model. This can take a bit of time as the download is 3.8G in size. Once you ask for the model to be downloaded it will begin to download as shown in Figure 4.

Picutre of Podman AI lab Catalog page listing the models that can be downloaded, with the granite model being actively downloaded. — Figure 4: Downloading the Granite model.

Models can also be downloaded from Hugging Face and imported, but it is nice that it's easy to download the Granite and other popular models without having to do that.

Once the model is downloaded, Podman AI Lab allows us to easily serve the model with an OpenAI compatible endpoint using a service.

Figure 5 shows the Services page.

Picture of podman AI lab service page with default text indicating there are no services and providing a link to create one. — Figure 5: Podman AI Lab default service page.

We asked that a new service be created, which will serve the Granite model we had downloaded earlier. This is shown in Figure 6.

Picture of the podman AI lab Service page for creating a service with the granite model specified and port 36851 as the container port — Figure 6: Creating a service to serve the Granite model.

Once complete, we can open the service details to get the URL that we’ll need to access the model, as shown in Figure 7.

Picture of the podman AI lab service details page showing the URL for the service that was created as including a sample curl command for accessing it. — Figure 7: Service details for the newly created service that serves the Granite model.

The URL being: http://localhost:36851/v1/chat/completions. We now have the Granite model being served by an endpoint available locally.

Trying out the Granite large language model with Node.js

Next we wanted to see how the Granite model worked with the Node.js and Langchain.js based Retrieval Augmented Generation (RAG) example that we had experimented with earlier.

We started by cloning the ai-experimentation repository:

  git clone https://github.com/mhdawson/ai-experimentation.git

We then went into the lesson-3-4 directory and edited the file langchainjs-backend.mjs to point to the local URL on which the Granite model was being served by Podman. This diff shows the changes we made:

  diff --git a/lesson-3-4/langchainjs-backends.mjs b/lesson-3-4/langchainjs-backends.mjs
    index dd71cb7..e6048aa 100644
    --- a/lesson-3-4/langchainjs-backends.mjs
    +++ b/lesson-3-4/langchainjs-backends.mjs
    @@ -42,9 +42,9 @@ console.log("Augmenting data loaded - " + new Date());
    ////////////////////////////////
    // GET THE MODEL
    -const model = await getModel('llama-cpp', 0.9);
    +//const model = await getModel('llama-cpp', 0.9);
    //const model = await getModel('openAI', 0.9);
    -//const model = await getModel('Openshift.ai', 0.9);
    +const model = await getModel('Openshift.ai', 0.9);
    ////////////////////////////////
    @@ -112,7 +112,7 @@ async function getModel(type, temperature) {
          { temperature: temperature,
            openAIApiKey: 'EMPTY',
            modelName: 'mistralai/Mistral-7B-Instruct-v0.2' },
    -      { baseURL: 'http://vllm.llm-hosting.svc.cluster.local:8000/v1' }
    +      { baseURL: 'http://localhost:36851/v1' }
        );
      };
      return model;

If you had looked at langchainjs-backend.mjs earlier you would have seen that it already supported switching between accessing a model served through node-llama-cpp, OpenAI, or Red Hat OpenShift AI. As the option for OpenShift AI used an OpenAI compatible endpoint, and the Podman AI Lab service also provides an OpenAI compatible endpoint all we had to do was:

Switch the call to getModel() to use the "Openshift.ai" option.
Switch the basedURL configured for the OpenShift.ai option in getModel() to point to the base URL served by Podman AI Lab. From the URL we shared earlier, that would be:
```
http://localhost:36851/v1
```

With those changes, we then ran the application with:

  node langchainjs-backends.mjs

From the output we can see the data for the Node.js reference architecture being loaded in order to support Retrieval Augmented Generation, the question being asked to the model, and the model responding with an answer that has been influenced by the Node.js reference architecture:

  Loading and processing augmenting data - Wed Jul 03 2024 15:51:33 GMT-0400 (Eastern Daylight Saving Time)
    Unknown file type: cors-error.png
    Unknown file type: _category_.json
    Unknown file type: _category_.json
    Unknown file type: _category_.json
    Augmenting data loaded - Wed Jul 03 2024 15:51:44 GMT-0400 (Eastern Daylight Saving Time)
    Loading model - Wed Jul 03 2024 15:51:44 GMT-0400 (Eastern Daylight Saving Time)
    2024-07-03T19:51:44.341Z
    {
   input: 'Should I use npm to start a node.js application',
   chat_history: [],
   context: [
     Document {
       pageContent: '## avoiding using `npm` to start application\n' +
         '\n' +
         'While you will often see `CMD ["npm", "start"]` in docker files\n' +
         'used to build Node.js applications there are a number\n' +
         'of good reasons to avoid this:',
       metadata: [Object]
     },
     Document {
       pageContent: "- One less component. You generally don't need `npm` to start\n" +
         '  your application. If you avoid using it in the container\n' +
         '  then you will not be exposed to any security vulnerabilities\n' +
         '  that might exist in that component or its dependencies.\n' +
         '- One less process. Instead of running 2 process (npm and node)\n' +
         '  you will only run 1.\n' +
         '- There can be issues with signals and child processes. You\n' +
         '  can read more about that in the Node.js docker best practices',
       metadata: [Object]
     },
     Document {
       pageContent: '```\n' +
         '\n' +
         'It should be noted that users and organizations can modify how `npm init` works, tailoring the resulting package.json to their needs.  For more information on this, check out the [official docs](https://docs.npmjs.com/cli/v9/commands/npm-init)',
       metadata: [Object]
     },
     Document {
       pageContent: '* [Introduction to the Node.js reference architecture: Node Module Development](https://developers.redhat.com/articles/2023/02/22/installing-nodejs-modules-using-npm-registry)',
       metadata: [Object]
     }
   ],
   answer: 'It is generally recommended to avoid using `npm` to start a Node.js application. While it may be convenient to use the `CMD ["npm", "start"]` syntax in Dockerfiles, there are several reasons to consider using an alternative approach:\n' +
     '\n' +
     '- **Simplified deployment**: By not relying on `npm`, you can avoid potential security vulnerabilities and reduce the number of components required to run your application. Additionally, you will only need to manage one process instead of two.\n' +
     '- **Easier signal handling**: With a direct `node` command, you can better control signal handling and child processes, which can be especially important in production environments.\n' +
     '\n' +
     'However, it is essential to acknowledge that users and organizations can customize the `npm init` process to suit their specific needs. For more information on tailoring package.json, refer to the official [docs](https://docs.npmjs.com/cli/v9/commands/npm-init).\n' +
     '\n' +
     'In summary, while `npm` can be a convenient option, there are valid reasons to consider using an alternative approach for starting a Node.js application. The choice ultimately depends on your specific use case and requirements.'
    }
    2024-07-03T19:52:27.207Z

Note that the run was a bit longer since we were not using GPU acceleration and were on a smaller machine.

Just like past runs with the mistral model we get an answer (the part after answer:) telling us to avoid using the npm command to start Node.js applications. This is a different answer than we get without the additional context provided from the Node.js reference architecture.

If you’ve not gone through the learning path and want to dive deeper into what a Node.js application using Langchain.js and Node.js looks like, you can look through the code in langchainjs-backend.mjs. It was good to see that we could use the existing Langchain.js based application with another method for serving a model (Podman AI desktop) and with a different model (Granite)

Wrapping up

As we mentioned in the introduction, we wanted to try out the Granite model and see what if any differences there were from using the models we’d used previously, and to do that using Podman AI Lab as another way to run models.

As you can see, the experience of using Podman AI Lab and the Granite model went well. It was easy to download and serve the model with Podman AI Lab, and the Granite model worked as expected with the Node.js and Langchain.js based application that implemented Retrieval Augmented Generation (RAG).

If you want to learn more about Node.js and AI you can check out AI & Node.js on Red Hat Developer.

If you would like to learn more about what the Red Hat Node.js team is up to in general, you can check out the Node.js topic page and the Node.js reference architecture.

Last updated: July 24, 2024

Linux

Java runtimes & frameworks

Kubernetes

Integration & App Connectivity

AI/ML

Automation

Developer tools

Developer Sandbox

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Secure Development & Architectures

Platform Engineering

Automated Data Processing

Start exploring in the Developer Sandbox for free

E-Books

Cheat Sheets

Documentation

Red Hat Learning

A quick look at large language models with Node.js, Podman Desktop, and the Granite model

Getting the Granite large language model running with Podman

Trying out the Granite large language model with Node.js

Wrapping up

Exploring Llama Stack with Python: Tool calling and agents

Enhance data security in OpenShift Data Foundation

AI meets containers: My first step into Podman AI Lab

Live migrating VMs with OpenShift Virtualization

Storage considerations for OpenShift Virtualization

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue

A quick look at large language models with Node.js, Podman Desktop, and the Granite model

Share:

Getting the Granite large language model running with Podman

Trying out the Granite large language model with Node.js

Wrapping up

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue