Ollama is a powerful tool that simplifies the process of creating, running, and managing large language models (LLMs). Hugging Face is a machine learning platform that's home to nearly 500,000 open source models. This tutorial will guide you through the steps to import a new model from Hugging Face and create a custom Ollama model. Get access to the latest and greatest without having to wait for it to be published to Ollama's model library. Let's get started!
Before getting started, make sure you have the following:
Ollama installed on your system
Hugging Face account (to download models)
Enough RAM/VRAM to load the model (16GB recommended for 1.6B parameter models)
To download a model from the Hugging Face model hub and run it locally using Ollama on your GPU server, you can follow these steps:
First, you need to download the GGUF file of the model you want from Hugging Face. For this tutorial, we’ll use the bartowski/Starling-LM-7B-beta-GGUF model as an example.
You can use the git to clone the repository:
# Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/bartowski/Starling-LM-7B-beta-GGUF
Next, create a Modelfile configuration that defines the model's behavior. Here's an example:
# Modelfile FROM "./Starling-LM-7B-beta-Q6_K.gguf" PARAMETER stop "<|im_start|>" PARAMETER stop "<|im_end|>" TEMPLATE """ <|im_start|>system <|im_end|> <|im_start|>user <|im_end|> <|im_start|>assistant """
Replace ./Starling-LM-7B-beta-Q6_K.gguf with the path to the GGUF file you downloaded. The TEMPLATE line defines the prompt format using system, user, and assistant roles. You can customize this based on your use case.
Now, build the Ollama model using the ollama create command:
ollama create "Starling-LM-7B-beta-Q6_K" -f Modelfile
Replace Starling-LM-7B-beta-Q6_K with the name you want to give your model, and Modelfile with the path to your Modelfile.
Finally, you can run and try your model using the ollama run command:
ollama run Starling-LM-7B-beta-Q6_K:latest
The :latest tag runs the most recent version of your model. That's it! You have successfully imported a Hugging Face model and created a custom Ollama model.
Explore the Ollama model library to find other models to try beyond StableLM.
Further customize model behavior by modifying the Modelfile
Change the prompt template
Set hyperparameters like temperature, max tokens, etc.
For more information, refer to the Ollama documentation and the Hugging Face model hub.