Ollama is an open source project that is a powerful and user-friendly platform for running LLMs on local machines. It acts as a bridge between the complexity of LLM technology and the desire for accessible and customizable AI experiences.
Essentially, Ollama simplifies the process of downloading, installing, and interacting with various LLMs, enabling users to explore their capabilities without requiring extensive technical expertise or reliance on cloud-based platforms.
1. VRAM (Video RAM): The amount of VRAM required depends on the size and complexity of the models being used. Here’s a general guide based on common LLMs:
Small Models (up to 7B parameters): Require approximately 8-12GB of VRAM.
Medium Models (8B to 14B parameters): Require around 12-16GB of VRAM.
Large Models (15B+ parameters): May require 20GB or more, with some very large models needing 48+GB of VRAM.
2. CUDA Cores: A higher number of CUDA cores helps in parallel processing, which speeds up training and inference tasks. For efficient performance, a GPU with at least 3,000 CUDA cores is recommended, although more cores are beneficial for larger models.
GPUMart is a leading provider of GPU Virtual Private Servers (VPS), specializing in high-performance configurations tailored for intensive AI and machine learning tasks. Their plans cater to a range of needs, from small-scale experimentation to large-scale model training and inference. GPUMart's offerings stand out due to their integration of powerful NVIDIA GPUs, robust CPU resources, and high-speed storage, all designed to deliver peak performance for demanding workloads.
One of GPUMart's standout VPS plans is built around the NVIDIA Quadro RTX A4000 GPU, which is designed to handle substantial AI workloads efficiently. Below are the specifications of the GPUMart GPU VPS with RTX A4000:
Dedicated GPU: Quadro RTX A4000
CUDA Cores: 6,144
VRAM: 16GB GDDR6
RAM: 32GB
CPU Cores: 24
SSD Storage: 320GB
The RTX A4000 is known for its balance of power and efficiency, making it a formidable choice for training and running large language models.
To evaluate the performance of GPUMart’s RTX A4000 plan, we tested four large language models on Ollama. These models vary in size and computational demands, providing a comprehensive view of the plan's capabilities.
Size: 4.4GB
Prompt Evaluation Rate: 63.91 tokens/s
The Qwen2:7b model, with a size of 4.4GB, performs efficiently on the RTX A4000, delivering a prompt evaluation rate of 63.91 tokens per second. This model showcases the plan's ability to handle medium-sized models with ease.
Size: 4.7GB
Prompt Eval Rate: 50.84 tokens/s
The Llama3:8b, slightly larger at 4.7GB, achieves a higher prompt evaluation rate of 50.84 tokens per second. With CUDA usage at 60% and VRAM usage around 6GB, this model highlights the RTX A4000's capacity to support efficient computation and memory utilization.
Model Size: 7.9GB
Prompt Eval Rate: 49.12 tokens/s
For the larger Phi3:14b model, which is 7.9GB, the prompt evaluation rate drops to 49.12 tokens per second. Despite the increased computational load, the RTX A4000 manages CUDA utilization effectively at 66%, with VRAM usage reaching 10GB.
Model Size: 26GB
Prompt Eval Rate: 5.93 tokens/s
The Mixtral:8x7b model, with a substantial size of 26GB, pushes the RTX A4000 to its limits. The prompt evaluation rate is significantly lower at 5.93 tokens per second. The dedicated GPU memory usage is 15GB, complemented by 14GB of shared GPU memory usage, demonstrating the A4000's capability to handle very large models, albeit with reduced performance.
The GPUMart RTX A4000 GPU VPS proves to be a robust solution for running a variety of large language models on Ollama. It excels in balancing CPU, GPU, and memory resources, ensuring efficient handling of models ranging from moderate to very large sizes. Whether you are deploying the compact Qwen2:7b or the hefty Mixtral:8x7b, this VPS configuration offers a dependable platform for your AI development needs.
For AI practitioners seeking a high-performance, reliable GPU VPS, the GPUMart RTX A4000 plan stands out as an excellent choice, providing the power and flexibility required to drive advanced AI applications forward.
Professional GPU VPS - A4000
Advanced GPU Dedicated Server - A4000