PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.
PrivateGPT is a private and secure AI solution designed for businesses to access relevant information in an intuitive, simple, and secure way. It is a custom solution that seamlessly integrates with a company's data and tools, addressing privacy concerns and ensuring a perfect fit for unique organizational needs and use cases. PrivateGPT allows users to ask questions about their documents using the power of Large Language Models (LLMs), even in scenarios without an internet connection, ensuring that data remains private and secure.
Model Size (B) | float32 | float16 | GPTQ 8bit | GPTQ 4bit |
---|---|---|---|---|
7B | 28 GB | 14 GB | 7 GB - 9 GB | 3.5 GB - 5 GB |
13B | 52 GB | 26 GB | 13 GB - 15 GB | 6.5 GB - 8 GB |
32B | 130 GB | 65 GB | 32.5 GB - 35 GB | 16.25 GB - 19 GB |
65B | 260.8 GB | 130.4 GB | 65.2 GB - 67 GB | 32.6 GB - 35 GB |
Install Python 3.11 (if you do not have it already). Ideally through a python version manager like conda. Earlier python versions are not supported.
conda create -n privateGPT python=3.11 conda activate privateGPT
Install Poetry for dependency management:
# method 1 pip install poetry # method 2, Windows (Powershell) (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py - # Add Poetry to your PATH, then check it poetry --version # If you see something like Poetry (version 1.2.0), your install is ready to use!
Install make to be able to run the different scripts:
# Windows: (Using chocolatey) choco install make # osx: (Using homebrew): brew install make
Install Ollama. The easiest way to run PrivateGPT fully locally is to depend on Ollama for the LLM. Ollama provides local LLM and Embeddings super easy to install and use, abstracting the complexity of GPU support. It’s the recommended setup for local development.
Go to ollama.ai and follow the instructions to install Ollama on your machine. After the installation, install the models to be used, the default settings-ollama.yaml is configured to user mistral 7b LLM (~4GB) and nomic-embed-text Embeddings (~275MB). Therefore:
ollama pull mistral ollama pull nomic-embed-text
Now, start Ollama service (it will start a local inference server, serving both the LLM and the Embeddings):
ollama serve
Note: Now check at localhost:11434, Ollama should be running.
git clone https://github.com/imartinez/privateGPT cd privateGPT
poetry install --extras "ui llms-ollama embeddings-ollama vector-stores-qdrant"
# Powershell $env:PGPT_PROFILES="ollama" # or CMD set PGPT_PROFILES=ollama make run
PrivateGPT will use the already existing settings-ollama.yaml settings file, which is already configured to use Ollama LLM and Embeddings, and Qdrant. Review it and adapt it to your needs (different models, different Ollama port, etc.)
The UI will be available at http://localhost:8001
Go to localhost:8001 to open Gradio client for privateGPT, ask question from LLM by choosing LLM chat Option.
Now choose Query Files,click on Upload files, In this example I have uploaded a pdf file. Now ask to summarise the document. Here you will get summarisation of PDF document.
Changing the Model: Modify settings.yaml in the root folder to switch between different models.
Running on GPU: If you want to utilize your GPU, ensure you have PyTorch installed.
# Install PyTorch with CUDA support: pip install torch==2.0.0+cu118 --index-url https://download.pytorch.org/whl/cu118
Now, launch PrivateGPT with GPU support:
poetry run python -m uvicorn private_gpt.main:app --reload --port 8001