Page

Retrieval-augmented generation

March 29, 2024

Michael Dawson

In the first two lessons, we took you through a basic LangChain.js example and then helped you speed up the locally running model. When running that basic example, we mentioned that the answer we got to the question “Should I use npm to start a node.js application?” was not the guidance we’d like people to follow. This answer is based on the data fed to the model when it was created, but we’d like to extend that knowledge base with Red Hat’s guidance for building Node.js applications. A common way to do this is through retrieval-augmented generation.

If you are familiar with large language models, you likely know that two key aspects of working with a model are crafting the question posed to the model and managing the context that goes along with the question. These are often covered as part of prompt engineering. Retrieval-augmented generation (RAG) is one way you can provide context that helps the model respond with the most appropriate answer. The basic concept is:

Data that provides additional context—in our case, the Markdown files from the Node.js Reference Architecture—is transformed into a format suitable for model augmentation. This often includes breaking up the data into maximum sized chunks that will later be provided as additional context to the model.
An embedding model is used to convert the source data into a set of vectors that represents the words in the data. These are stored in a database so the data can be retrieved through a query against the matching vectors. Most commonly, the data is stored in a vector database like Chroma.
The application is enhanced so that before passing on a query to the model, it queries the database for matching documents. The most relevant documents are then added to the context sent along with the query to the model as part of the prompt.
The model returns an answer which in part is based on the context provided.

So why not simply pass the content of all of the documents to the model? The problem is that the size of the context that can be passed to a model is limited. For that reason, we need to identify the more relevant context and pass only that subset.

When implementing RAG, it is important to consider the sensitivity of the data being used to build additional context. Often this data is proprietary and it’s important to understand that if you are using a public service like GPT-4 you might be effectively disclosing this data to the world. This is often one of the motivations for running models locally or within an organization.

In Lesson 4, we’ll show how you can easily switch an application from using a local model to a public service like OpenAI or a model running on OpenShift AI within your organization. It’s important to factor in the sensitivity of any context data that will be used with queries and cost when choosing which method to use. For our example, we’ll be using the public data from the Node.js Reference Architecture. Therefore, it should be safe to use a local model or a public service, although even the questions asked might leak information.

If you don’t have a GPU and the queries from the earlier examples are still running slowly for you, you might want to take a peak at Lesson 4 before moving forward. If you have an existing OpenAI account or access to a vLLM model running in OpenShift AI , you might want to use the code in the example for Lesson 4. That code is the same as the code in this example except that it allows for easy switching between a model running locally in the OpenAI cloud service or a model hosted in OpenShift AI.

In order to get full benefit from taking this lesson, you need to:

Set up the environment as explained in earlier lessons.

In this lesson, you will:

Explore a more advanced LangChain.js example that implements retrieval-augmented generation and learn about some additional LangChain.js APIs.
Run the application and see how relevant context is matched to the query.
Run the application and observe that we get a more appropriate answer to our question based on the content from the Node.js Reference Architecture.

Set up the environment

Start by changing into the lesson-3-4 directory. Copy over the node_modules and models directory from lesson-1-2 into the lesson-3-4 directory so that we don’t need to rebuild node-llama-cpp or redownload the model:
```
cp -R ../lesson-1-2/models
cp -R ../lesson-1-2/node_modules
```
Copy snippet
Install the additional packages used in this lesson by running:
```
npm install
```
Copy snippet
Copy over the Markdown files for the Node.js Reference Architecture into the directory from which we’ll read additional documents to be used for context:
```
mkdir SOURCE_DOCUMENTS
git clone https://github.com/nodeshift/nodejs-reference-architecture.git
cp -R nodejs-reference-architecture/docs SOURCE_DOCUMENTS 
```
Copy snippet
For Windows, use the file manager and make sure to copy the docs and all subdirectories to SOURCE_DOCUMENTS.

Exploring the RAG LangChain.js example

There are two main additions to the RAG example, which include loading augmenting data and then using that data in the prompt.

We are going to explore the code in langchains-rag.mjs. This code starts by loading the Markdown files from the Node.js Reference Architecture into an in-memory vector database available in LangChain.js:

////////////////////////////////
// LOAD AUGMENTING DATA
// typically this is stored in a database versus being loaded every time

console.log("Loading and processing augmenting data - " + new Date());

const docLoader = new DirectoryLoader(
  "./SOURCE_DOCUMENTS",
  {
    ".md": (path) => new TextLoader(path),
  }
);
const docs = await docLoader.load();
const splitter = await new MarkdownTextSplitter({
  chunkSize: 500,
  chunkOverlap: 50
});
const splitDocs = await splitter.splitDocuments(docs);
const vectorStore = await MemoryVectorStore.fromDocuments(
  splitDocs,
  new HuggingFaceTransformersEmbeddings()
);
const retriever = await vectorStore.asRetriever();
console.log("Augmenting data loaded - " + new Date());

Copy snippet

The first part uses the DirectoryLoader API to recursively load all of the documents we copied in the SOURCE_DOCUMENTS directory. For each Markdown file ending in .md, it uses the TextLoader API to load the document. There is built-in support for a number of different document types, including CSV, JSON, PDF, and more. You can read about these in the Document Loaders section of the LangChain.js documentation.

Once loaded, the documents need to be split into chunks that can be indexed and retrieved in order to provide additional context to queries made to the model. To do this, we use the MarkdownTextSplitter API to break the documents up into chunks of 500 bytes.

Once the documents are split into chunks, the MemoryVectorStore API is used to create an in-memory database for the split documents using the HuggingFaceTransformersEmbeddings API as the embeddings used to create the set of vectors used to index each chunk. LangChain.js supports a number of different embeddings, and the most appropriate one might depend on the model being used. In our case, the HuggingFaceTransformersEmbeddings seemed to be effective.

Finally, we get an instance of the Retriever API that can be used to look up chunks based on the query.

In a real application we would not read the documents every time, but instead use a persistent database. We’ll cover that in a future learning path. For this example, loading the documents every time seems to take about 10 to 20 seconds, which is a reasonable tradeoff for keeping the example simple.

Using the retriever created, we can find chunks matching the question from our examples with:
```
retriever.getRelevantDocuments(“Should I use npm to start a node.js application?");
```
Copy snippet
We’ll see the documents returned for that query later on in the lesson.
The second major addition is the creation of a chain using the StuffDocumentsChain and RetrievalChain APIs.

////////////////////////////////
// CREATE CHAIN
const prompt =
  ChatPromptTemplate.fromTemplate(`Answer the following question based only on the provided context, if you don't know the answer say so:
  
<context>
{context}
</context>
Question: {input}`);
const documentChain = await createStuffDocumentsChain({
  llm: model,
  prompt,
});
const retrievalChain = await createRetrievalChain({
  combineDocsChain: documentChain,
  retriever,
});

Copy snippet

This shows using the LangChain expression language to compose chains.

The createStuffDocumentsChain takes a list of documents, formats them into a prompt, and sends it on to the model. We phrase the prompt in a way that frames the question and includes the documents that are passed in the {context} key.
The retrievalChain takes the query to be sent to the model along with a retriever that will look up related document chunks using the retriever passed in. We pass it the StuffDocumentsChain, which will then use the document chunks to format the full prompt sent to the model.
You can read more about chains and how to compose them in the chains section of the LangChain.js documentation.

Now that we’ve built the chain, we can ask it our question just like before:

////////////////////////////////
// ASK QUESTIONS

console.log(new Date());
let result = await retrievalChain.invoke({
  input: "Should I use npm to start a node.js application",
});
console.log(result);
console.log(new Date());

Copy snippet

Running the RAG LangChain.js example

You can run the RAG example with:
```
node langchainjs-rag.mjs
```
Copy snippet

The answer can vary, but this time you should get an answer that reflects the recommendations in the Node.js Reference Architecture:

'Assistant: It is generally not necessary to use `npm` to start a Node.js application. If you avoid using it in the container, you will not be exposed to any security vulnerabilities that might exist in that component or its dependencies. However, it is important to build security into your software development process when developing Node.js modules and applications. This includes managing dependencies, managing access and content of public and private data stores such as npm and github, writing defensive code, limiting required execution privileges, supporting logging and monitoring, and externalizing secrets.'

Copy snippet

Looking at the output, we can see that in addition to the answer we’ve also printed the document chunks that were included in the context sent to the model. As mentioned before, the total size of the prompt including the context is limited. The retriever helps us select the document chucks most relevant to the question and include them in the context.

  context: [
    Document {
      pageContent: '## avoiding using `npm` to start application\r\n' +
        '\r\n' +
        'While you will often see `CMD ["npm", "start"]` in docker files\r\n' +
        'used to build Node.js applications there are a number\r\n' +
        'of good reasons to avoid this:\r\n' +
        '\r\n' +
        "- One less component. You generally don't need `npm` to start\r\n" +
        '  your application. If you avoid using it in the container\r\n' +
        '  then you will not be exposed to any security vulnerabilities\r\n' +
        '  that might exist in that component or its dependencies.',
      metadata: [Object]
    },
    Document {
      pageContent: '* [Introduction to the Node.js reference architecture: Node Module Development](https://developers.redhat.com/articles/2023/02/22/installing-nodejs-modules-using-npm-registry)',
      metadata: [Object]
    },
    Document {
      pageContent: '# Secure Development Process\r\n' +
        '\r\n' +
        'It is important to build security into your software development process. Some of the key elements to address in the development process for Node.js modules and applications include:\r\n' +
        '\r\n' +
        '* Managing dependencies\r\n' +
        '* Managing access and content of public and private data stores\r\n' +
        '  such as npm and github \r\n' +
        '* Writing defensive code\r\n' +
        '* Limiting required execution privileges\r\n' +
        '* Supporting logging and monitoring\r\n' +
        '* Externalizing secrets',
      metadata: [Object]
    },
    Document {
      pageContent: '## Further Reading\r\n' +
        '\r\n' +
        '* [Introduction to the Node.js reference architecture: Node Module Development](https://developers.redhat.com/articles/2023/02/22/installing-nodejs-modules-using-npm-registry)\r\n' +
        '\r\n' +
        '* https://github.blog/changelog/2020-10-02-npm-automation-tokens/\r\n' +
        '\r\n' +
        '* https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610',
      metadata: [Object]
    }
  ],

Copy snippet

Looking at the answer and the context, you can see that the answer is based in part on the matching document chunks.

Conclusion

In this lesson, we explored using retrieval-augmented generation with LangChain.js. We introduced the APIs for document loading, splitting, and retrieval along with the StuffDocumentsChain and RetrievalChain APIs using the LangChain Expression Language to compose together. Finally, we reran our query to get an answer based on the context from the Node.js Reference Architecture.

Now that you’ve seen how easy it is to interact with a model using LangChain.js, we’ll build on this by showing how LangChain.js makes it simple to develop, experiment, and test in one environment and deploy to another environment with minimal changes to your application.

How to get started with large language models and Node.js

Path resource: Retrieval-augmented generation

In order to get full benefit from taking this lesson, you need to:

In this lesson, you will:

Set up the environment

Exploring the RAG LangChain.js example

Running the RAG LangChain.js example

Conclusion

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue