Build a Local RAG App Using Ollama and Chroma DB



Introduction

Building a local Retrieval-Augmented Generation (RAG) application has become a powerful way to provide accurate, up-to-date, and contextually relevant AI-generated content. With the growing interest in using smaller, specialized models, Ollama has emerged as a viable alternative to larger AI providers like OpenAI or Google. This blog will guide you through building a localized RAG application using Ollama, an efficient and user-friendly platform for deploying and managing custom AI models.

What is a RAG Application?

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval (Retrieval) with generative models (Generation). By retrieving relevant information from external knowledge bases, it enhances the knowledge accuracy and response quality of generative AI models (such as GPT). RAG technology is highly effective in solving complex tasks that require large amounts of external knowledge and is widely applied in scenarios such as question-answering systems, document generation, and knowledge retrieval.

Why Use Ollama for RAG?

Ollama is designed for developers who want to create domain-specific, localized AI applications without relying on big tech companies like Microsoft or Google. It supports running language models locally or in a private cloud environment, offering more control and flexibility in terms of data privacy and model tuning.

Some advantages of using Ollama for building a RAG application:

Localized Deployment: Ollama allows you to deploy models on-premises, which is ideal for businesses or individuals who need localized control over their data.

Smaller, Domain-Specific Models: Ollama focuses on creating smaller, efficient models that are more targeted to specific use cases.

Cost-Effective: By reducing dependency on large cloud infrastructures, Ollama can help reduce operational costs.

Privacy: Ollama’s local deployment ensures that sensitive or proprietary data stays on your servers.

Prerequisites

Before we start, you should have the following prerequisites:

CUDA-capable GPU: Ensure you have an NVIDIA GPU installed.

Docker: Ensure that Docker is installed on your system. You can install Docker by following the official installation guide (https://docs.docker.com/engine/install/) for your operating system here.

NVIDIA Driver: Ensure that the appropriate NVIDIA driver is installed on your system. You can check the installation with the nvidia-smi command.

Other Toolkits: Node.js v20.11.1, pnpm v9.12.2

9 Steps to Build a Local RAG App

Step 1. Install Ollama and run Ollama server

Ollama provides the backend infrastructure needed to run LLMs locally. To get started, head to Ollama's website and download the application. Follow the instructions to set it up on your local machine. By default, it's running on http://localhost:11434.

Step 2. Install Chroma DB

Please refer to https://docs.trychroma.com/getting-started for Chroma DB installation. We recommend you run it in a docker container:

#https://hub.docker.com/r/chromadb/chroma/tags
$ docker pull chromadb/chroma
$ docker run -d -p 8000:8000 chromadb/chroma

Note: There are many options for local storage of vector data in a localized RAG application, such as Chroma DB, Milvus, and so on. In ChatOllama Chroma DB is chosen, which also requires the user to run Chroma server in the local environment as well when running ChatOllama. Now, Chroma DB is running on http://localhost:8000.

Step 3. ChatOllama Setup

ChatOllama is an open source chatbot based on LLMs. It supports a wide range of language models, and knowledge base management. Now, we can complete the necessary setup to run ChatOllama. Clone the repo from github.com and copy the .env.example file to .env file:

$ git clone https://github.com/sugarforever/chat-ollama.git
$ cd chat-ollama
$ cp .env.example .env

Make sure to install the dependencies:

$ pnpm install

Run a migration to create your database tables with Prisma Migrate

$ pnpm prisma-migrate

Step 4. Launch Development Server

Make sure both Ollama Server and Chroma DB are running. Start the development server on http://localhost:3000:

$ pnpm dev

Step 5. Visit ChatOllama WebUI

Open the ChatOllama web interface by pointing your browser to http://localhost:3000.

Step 6. Config Ollama Server in ChatOllama Settings

Go to Settings to set the host of your Ollama server.

Step 7. ChatOllama Download Modles

Click the Models tab to manage models: list, download or delete a model. Note: Ollama supports the currently very popular text embedding model nomic-embed-text for very long context windows. This is also the model I will use in ChatOllama.

Step 8. Create a Knowledge Base

Click the Knowledge Bases tab to manage models: Select files, folders or directly enter urls, specify the text embedding model, set the name, and create a knowledge base.

Step 9. Chat with the Knowledge Base

Click on the created knowledge base to go to the chat screen.

Conclusion

Using Ollama to build a localized RAG application gives you the flexibility, privacy, and customization that many developers and organizations seek. By combining powerful retrieval tools with efficient generative models, you can provide highly relevant and up-to-date responses tailored to your specific audience or region.

How to Build a Local RAG Application Using Ollama and Chroma DB