Professional GPU VPS - A4000
Advanced GPU Dedicated Server - A5000
Enterprise GPU Dedicated Server - RTX A6000
Enterprise GPU Dedicated Server - RTX 4090
Enterprise GPU Dedicated Server - A100
Multi-GPU Dedicated Server - 2xA100
Enterprise GPU Dedicated Server - A100(80GB)
Enterprise GPU Dedicated Server - H100
Gemma 3 includes the following key features:
Image and Text Input: With multimodal capabilities, you can input both images and text to understand and analyze visual data.
128K Token Context: The input context has been expanded 16 times, enabling the analysis of more data and the solving of more complex problems.
Extensive Language Support: Supports over 140 languages, allowing you to operate in your preferred language or expand the linguistic capabilities of your AI applications.
Developer-Friendly Model Sizes: Choose the model size (1B, 4B, 12B, 27B) and precision level that best fit your task and computing resources.
This model requires Ollama 0.6 or later.
# install Ollama on Linux curl -fsSL https://ollama.com/install.sh | sh
Text only - 1B parameter model (32k context window)
ollama run gemma3:1b
Multimodal (Vision) - 4B parameter model (128k context window)
ollama run gemma3:4b
12B parameter model (128k context window)
ollama run gemma3:12b
27B parameter model (128k context window)
ollama run gemma3:27b
By default, vLLM uses the model file on HuggingFace, using Tensor type BF16, which is 4 times the size of the 4-bit quantization in the Ollama library. So we need to use a GPU card with more memory.
# Prerequirements # A100 80GB or H100 GPU Dedicated Server uv pip install vllm vllm serve google/gemma-3-27b-it --max-model-len 131072