How to Run Llama 3.1 8B with Ollama



What is Ollama?

Ollama is an open source large language modeling service tool that helps users quickly run large models locally. With a simple install command, users can run open source large language models such as qwen locally with a single command. ollama greatly simplifies the process of deploying and managing LLMs in Docker containers, enabling users to quickly run large language models locally!

What is Llama 3.1 8B?

Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes. Specifically, the "8B" denotes that this model has 8 billion parameters, which are the variables the model uses to make predictions.

Llama3.1 8B balances performance and computational efficiency, making it suitable for a range of applications such as text generation, question answering, language translation, and code generation. Despite having fewer parameters compared to larger models like Llama 3.1 70B, it delivers impressive results in various natural language processing tasks. Additionally, Meta’s smaller models are competitive with closed and open models that have a similar number of parameters.

System Requirements

CPU >= 8 cores

RAM >= 16 GB

VRAM >= 8GB

NVIDIA RTX 3070 or better is recommended for optimal performance.

Download and Install Ollama

Ollama is available for MacOS, Ubuntu and Windows (preview).

Ollama Download: https://ollama.com/download/windows

Click Windows on the download page and then click the download button. Installation: Once the download is complete, double-click on the downloaded installer.

Once the download is complete, double-click the downloaded installer and click Install to install it. The installation completes without prompting and we need open a terminal.

How to run Llama 3.1 8B with Ollama

Open the Terminal

Let's open a terminal, this article to CMD as an example, now Ollama has been installed, we need to enter the following command in the terminal to run llama3:8b large model of the language for the test.

Note: The initial run is a bit long, and you need to download several gigabytes of model files locally. Once downloaded, you can interact with the llama 3.1:8b model directly from the terminal.

# Run Llama 3.1:8b
ollama run llama3.1

Ask a Question

You can directly enter the question you want to ask in the program dialog box.

Make an API Query

Use curl to send a query to the running server:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?",
  "stream": false 
}'

For more information on the use of the model, please refer to: https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models

Additional - Some Good GPU Plans for Ollama AI

Summer Sale

Professional GPU VPS - A4000

$ 90.3/mo

Save 50% (Was $179.00)

1mo3mo12mo24mo

Order Now

32GB RAM
24 CPU Cores
320GB SSD
300Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10
Dedicated GPU: Quadro RTX A4000
CUDA Cores: 6,144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.

Advanced GPU - A4000

$ 209.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps

OS: Windows / Linux
GPU: Nvidia Quadro RTX A4000
Microarchitecture: Ampere
Max GPUs: 2
CUDA Cores: 6144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Good Choice for 3D Rendering, Video Editing, AI/Deep Learning, Data Science, etc

Summer Sale

Advanced GPU - A5000

$ 242.1/mo

Save 31% (Was $349.00)

1mo3mo12mo24mo

Order Now

128GB RAM
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps

OS: Windows / Linux
GPU: Nvidia Quadro RTX A5000
Microarchitecture: Ampere
Max GPUs: 2
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Enterprise GPU - RTX A6000

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps

OS: Windows / Linux
GPU: Nvidia Quadro RTX A6000
Microarchitecture: Ampere
Max GPUs: 1
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

Enterprise GPU - A40

$ 439.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps

OS: Windows / Linux
GPU: Nvidia A40
Microarchitecture: Ampere
Max GPUs: 1
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 37.48 TFLOPS

New Arrival

Multi-GPU - 3xRTX A5000

$ 539.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps

OS: Windows / Linux
GPU: 3 x Quadro RTX A5000
Microarchitecture: Ampere
Max GPUs: 3
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

New Arrival

Multi-GPU - 3xRTX A6000

$ 899.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps

OS: Windows / Linux
GPU: 3 x Quadro RTX A6000
Microarchitecture: Ampere
Max GPUs: 3
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS