How to Manage LLM Models with Ollama API

Learn to efficiently manage LLM models with the Ollama API. Explore our detailed resources and elevate your understanding of AI model management today.

Introduction

Ollama is an AI tool platform dedicated to providing localized AI model running and management services. It allows users to download, run, and manage AI models on local devices without relying on the cloud.

The Ollama API is an interface provided by the Ollama platform to easily integrate and call AI models in a localized environment. The API allows developers to run, manage, and customize AI models on local devices and embed the functionality of these models into applications. It provides a flexible and easy way to use AI models locally, with high performance and privacy benefits.

We can easily download, start, stop, and delete local AI models through the API, which interacts with the Ollama platform's local model repository to help users manage the model lifecycle.

Manage LLM Models via Ollama API

1. Create a Model

Create a model from a Modelfile. It is recommended to set modelfile to the content of the Modelfile rather than just set path. This is a requirement for remote create.

POST /api/create

1.1 Parameters

name: name of the model to create

modelfile (optional): contents of the Modelfile

stream (optional): if false the response will be returned as a single response object, rather than a stream of objects

path (optional): path to the Modelfile


1.2 Examples

Request:

curl http://localhost:11434/api/create -d '{
  "name": "Queta",
  "modelfile": "FROM llama3\nSYSTEM You are Queta from Super Mario Bros."}'

Response:

A stream of JSON objects. Notice that the final JSON object shows a "status": "success".

{"status":"pulling manifest"}
……
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"success"}

2. Generate a completion

Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.

POST /api/generate

2.1 Parameters

model: (required) the model name

prompt: the prompt to generate a response for

suffix: the text after the model response

images: (optional) a list of base64-encoded images (for multimodal models such as llava)


2.2 Examples

Request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Why is the sky blue?"
}'

Response:

A stream of JSON objects is returned. If stream is set to false, the response will be a single JSON object.

{
  "model": "llama3.2",
  "created_at": "2024-09-28T08:00:07.724299416Z",
  "response": "The sky appears blue because ……"
  "total_duration": 53348326117,
  "load_duration": 18347472,
  "prompt_eval_count": 31,
  "prompt_eval_duration": 163546000,  
  "eval_count": 320,
  "eval_duration": 53123753000
}

For more information on how to use the parameter, please refer to: generate-a-completion.

3. Generate a chat completion

Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using "stream": false. The final response object will include statistics and additional data from the request.

POST /api/chat

3.1 Parameters

model: (required) the model name

messages: the messages of the chat, this can be used to keep a chat memory

tools: tools for the model to use if supported. Requires stream to be set to false

The message object has the following fields:

role: the role of the message, either system, user, assistant, or tool

content: the content of the message

images (optional): a list of images to include in the message (for multimodal models such as llava)

tool_calls (optional): a list of tools the model wants to use


3.2 Examples - Chat Request (Streaming)

Request:

Send a chat message with a streaming response.

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"    }
  ]
}'

Response:

A stream of JSON objects is returned. Final response:

{
  "model": "llama3.2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "done": true,
  "total_duration": 4883583458,
  "load_duration": 1334875,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 342546000,
  "eval_count": 282,
  "eval_duration": 4535599000
}

4. List Local Models

List models that are available locally.

GET /api/tags


Examples

Request:

curl http://localhost:11434/api/tags

Response:

A single JSON object will be returned.

{
  "models": [
    {
      "name": "codellama:13b",
      "modified_at": "2023-11-04T14:56:49.277302595-07:00",
      "size": 7365960935,
      "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "13B",
        "quantization_level": "Q4_0"
      }
    },
    {
      "name": "llama3:latest",
      "modified_at": "2023-12-07T09:32:18.757212583-08:00",
      "size": 3825819519,
      "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

5. Show Model Information

Show information about a model including details, modelfile, template, parameters, license, system prompt.

POST /api/show

5.1 Parameters

name: name of the model to show

verbose: (optional) if set to true, returns full data for verbose response fields


5.2 Examples

Request:

curl http://localhost:11434/api/show -d '{
  "name": "llama3.2"
}'

Response:

A single JSON object will be returned.

{
  "modelfile": "# Modelfile generated by ……",
   ……
  },
  "model_info": {
    "general.architecture": "llama",
     ……
  }
}

6. Copy a Model

Copy a model. Creates a model with another name from an existing model.

POST /api/copy


Examples

Request:

curl http://localhost:11434/api/copy -d '{
  "source": "llama3.2",
  "destination": "llama3-backup"
}'

Response:

Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't exist.

7. Delete a Model

Delete a model and its data.

DELETE /api/delete

7.1 Parameters

name: model name to delete


7.2 Examples

Request:

curl -X DELETE http://localhost:11434/api/delete -d '{
  "name": "llama3:13b"
}'

Response:

Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't exist.

8. Pull a Model

Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.

POST /api/pull

8.1 Parameters

name: name of the model to pull

insecure: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.

stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects


8.2 Examples

Request:

curl http://localhost:11434/api/pull -d '{
  "name": "llama3.2"
}'

Response:

If stream is not specified, or set to true, a stream of JSON objects is returned. If stream is set to false, then the response is a single JSON object:

{
  "status": "success"
}

9. List Running Models

List models that are currently loaded into memory.

GET /api/ps


Examples

Request:

curl http://localhost:11434/api/ps

Response:

A single JSON object will be returned.

{
  "models": [
    {
      "name": "mistral:latest",
      "model": "mistral:latest",
      "size": 5137025024,
      "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "7.2B",
        "quantization_level": "Q4_0"
      },
      "expires_at": "2024-06-04T14:38:31.83753-07:00",
      "size_vram": 5137025024
    }
  ]
}