How to Run LLaMA 3 with Ollama

Meta Llama 3 is the state-of-the-art , available in both 8B and 70B parameter sizes. Let's see how to run Llama 3 with Ollama.

What's LLaMA 3?

Meta Llama 3: The most capable openly available LLM to date

LLaMA 3 is a type of artificial intelligence (AI) model developed by Meta AI, a research laboratory that focuses on natural language processing (NLP) and other AI-related areas.

What makes LLaMA 3 special is its ability to understand and respond to a wide range of topics and questions, often with a high degree of accuracy and coherence. It's been trained on a massive dataset of text from the internet and can adapt to different contexts and styles.

Key features of LLaMA 3

LLaMA 3 has many potential applications, such as chatbots, virtual assistants, language translation, and content generation. It's an exciting development in the field of AI, and I'm happy to chat with you more about it!

Conversational dialogue: LLaMA 3 can engage in natural-sounding conversations, using context and understanding to respond to questions and statements.

Knowledge retrieval: It can access a vast knowledge base to provide accurate information on a wide range of topics.

Common sense: LLaMA 3 has been designed to understand common sense and real-world concepts, making its responses more relatable and human-like.

Fine-tuned and optimized: Llama 3 instruction-tuned models are fine-tuned and optimized for dialogue/chat use cases and outperform many of the available open-source chat models on common benchmarks.

Meta Llama 3 Instruct model performance
Meta Llama 3 Pre-trained model performance

The most capable model

Llama 3 represents a large improvement over Llama 2 and other openly available models:

Trained on a dataset seven times larger than Llama 2

Double the context length of 8K from Llama 2

Encodes language much more efficiently using a larger token vocabulary with 128K tokens

Less than 1⁄3 of the false “refusals” when compared to Llama 2

How to run LLaMA 3 with Ollama

Llama 3 is now available to run using Ollama. To get started, Download Ollama and run Llama 3.

CLI

Open the terminal and run ollama run llama3

The initial release of Llama 3 includes two sizes:8B and 70B parameters:

# 8B Parameters
ollama run llama3:8b

# 70B Parameters
ollama run llama3:70b

API

Example using curl:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
 }'

Model variants

Instruct is fine-tuned for chat/dialogue use cases. Example:

ollama run llama3
ollama run llama3:70b

Pre-trained is the base model. Example:

ollama run llama3:text
ollama run llama3:70b-text

References

Additional - Some Good GPU Plans for Ollama AI

Express GPU VPS - K620

21.00/mo
1mo3mo12mo24mo
Order Now
  • 12GB RAM
  • 9 CPU Cores
  • 160GB SSD
  • 100Mbps Unmetered Bandwidth
  • Once per 4 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro K620
  • CUDA Cores: 384
  • GPU Memory: 2GB DDR3
  • FP32 Performance: 0.863 TFLOPS

Lite GPU Dedicated Server - K620

49.00/mo
1mo3mo12mo24mo
Order Now
  • 16GB RAM
  • Quad-Core Xeon E3-1270v3
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro K620
  • Microarchitecture: Maxwell
  • CUDA Cores: 384
  • GPU Memory: 2GB DDR3
  • FP32 Performance: 0.863 TFLOPS
  • Ideal for lightweight Android emulators, small LLMs, graphic processing, and more. Powerful than GPU VPS.

Express GPU Dedicated Server - P620

59.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • Eight-Core Xeon E5-2670
  • 120GB + 960GB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro P620
  • Microarchitecture: Pascal
  • CUDA Cores: 512
  • GPU Memory: 2GB GDDR5
  • FP32 Performance: 1.5 TFLOPS

Professional GPU VPS - A4000

129.00/mo
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.
Christmas Sale

Advanced GPU Dedicated Server - A5000

244.00/mo
30% OFF Recurring (Was $349.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A5000
  • Microarchitecture: Ampere
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS

Enterprise GPU Dedicated Server - RTX A6000

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A6000
  • Microarchitecture: Ampere
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS
  • Optimally running AI, deep learning, data visualization, HPC, etc.
Christmas Sale

Multi-GPU Dedicated Server - 3xV100

359.00/mo
40% OFF Recurring (Was $599.00)
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: 3 x Nvidia V100
  • Microarchitecture: Volta
  • CUDA Cores: 5,120
  • Tensor Cores: 640
  • GPU Memory: 16GB HBM2
  • FP32 Performance: 14 TFLOPS
  • Expertise in deep learning and AI workloads with more tensor cores

Enterprise GPU Dedicated Server - A100

639.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia A100
  • Microarchitecture: Ampere
  • CUDA Cores: 6912
  • Tensor Cores: 432
  • GPU Memory: 40GB HBM2e
  • FP32 Performance: 19.5 TFLOPS
  • Good alternativeto A800, H100, H800, L40. Support FP64 precision computation, large-scale inference/AI training/ML.etc