How to Run Llama 3.1 8B with Ollama

Llama 3.1 is the state-of-the-art, available in 8B, 70B and 405B parameter sizes. Let's see how to run Llama 3.1 8B with Ollama.
3-day Free Trial: Gift for New Users!

We’re excited to offer a free trial for new clients to test 20+ NVIDIA GPU Servers. Once we receive your trial request, we’ll send you the login details within 30 minutes to 2 hours.

What is Ollama?

Ollama is an open source large language modeling service tool that helps users quickly run large models locally. With a simple install command, users can run open source large language models such as qwen locally with a single command. ollama greatly simplifies the process of deploying and managing LLMs in Docker containers, enabling users to quickly run large language models locally!

What is Llama 3.1 8B?

Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes. Specifically, the "8B" denotes that this model has 8 billion parameters, which are the variables the model uses to make predictions.

Llama3.1 8B balances performance and computational efficiency, making it suitable for a range of applications such as text generation, question answering, language translation, and code generation. Despite having fewer parameters compared to larger models like Llama 3.1 70B, it delivers impressive results in various natural language processing tasks. Additionally, Meta’s smaller models are competitive with closed and open models that have a similar number of parameters.

Model evaluations

System Requirements

CPU >= 8 cores

RAM >= 16 GB

VRAM >= 8GB

NVIDIA RTX 3070 or better is recommended for optimal performance.

Download and Install Ollama

Ollama is available for MacOS, Ubuntu and Windows (preview).

Click Windows on the download page and then click the download button. Installation: Once the download is complete, double-click on the downloaded installer.

install ollama on windows

Once the download is complete, double-click the downloaded installer and click Install to install it. The installation completes without prompting and we need open a terminal.

How to run Llama 3.1 8B with Ollama

Open the Terminal

Let's open a terminal, this article to CMD as an example, now Ollama has been installed, we need to enter the following command in the terminal to run llama3:8b large model of the language for the test.

Note: The initial run is a bit long, and you need to download several gigabytes of model files locally. Once downloaded, you can interact with the llama 3.1:8b model directly from the terminal.

# Run Llama 3.1:8b
ollama run llama3.1
Ask a Question

You can directly enter the question you want to ask in the program dialog box.

Ask ollama a question

Make an API Query

Use curl to send a query to the running server:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?",
  "stream": false 
}'

For more information on the use of the model, please refer to: https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models

Additional - Some Good GPU Plans for Ollama AI
Summer Sale

Professional GPU VPS - A4000

90.3/mo
Save 50% (Was $179.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.

Advanced GPU - A4000

209.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A4000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Good Choice for 3D Rendering, Video Editing, AI/Deep Learning, Data Science, etc
Summer Sale

Advanced GPU - A5000

242.1/mo
Save 31% (Was $349.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A5000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS

Enterprise GPU - RTX A6000

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A6000
  • Microarchitecture: Ampere
  • Max GPUs: 1
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

Enterprise GPU - A40

439.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia A40
  • Microarchitecture: Ampere
  • Max GPUs: 1
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 37.48 TFLOPS
New Arrival

Multi-GPU - 3xRTX A5000

539.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: 3 x Quadro RTX A5000
  • Microarchitecture: Ampere
  • Max GPUs: 3
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS
New Arrival

Multi-GPU - 3xRTX A6000

899.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: 3 x Quadro RTX A6000
  • Microarchitecture: Ampere
  • Max GPUs: 3
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS