Ollama is an AI tool platform dedicated to providing localized AI model running and management services. It allows users to download, run, and manage AI models on local devices without relying on the cloud.
The Ollama API is an interface provided by the Ollama platform to easily integrate and call AI models in a localized environment. The API allows developers to run, manage, and customize AI models on local devices and embed the functionality of these models into applications. It provides a flexible and easy way to use AI models locally, with high performance and privacy benefits.
We can easily download, start, stop, and delete local AI models through the API, which interacts with the Ollama platform's local model repository to help users manage the model lifecycle.
Create a model from a Modelfile. It is recommended to set modelfile to the content of the Modelfile rather than just set path. This is a requirement for remote create.
POST /api/create
1.1 Parameters
name: name of the model to create
modelfile (optional): contents of the Modelfile
stream (optional): if false the response will be returned as a single response object, rather than a stream of objects
path (optional): path to the Modelfile
1.2 Examples
Request:
curl http://localhost:11434/api/create -d '{ "name": "Queta", "modelfile": "FROM llama3\nSYSTEM You are Queta from Super Mario Bros."}'
Response:
A stream of JSON objects. Notice that the final JSON object shows a "status": "success".
{"status":"pulling manifest"} …… {"status":"verifying sha256 digest"} {"status":"writing manifest"} {"status":"success"}
Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
POST /api/generate
2.1 Parameters
model: (required) the model name
prompt: the prompt to generate a response for
suffix: the text after the model response
images: (optional) a list of base64-encoded images (for multimodal models such as llava)
2.2 Examples
Request:
curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "Why is the sky blue?" }'
Response:
A stream of JSON objects is returned. If stream is set to false, the response will be a single JSON object.
{ "model": "llama3.2", "created_at": "2024-09-28T08:00:07.724299416Z", "response": "The sky appears blue because ……" "total_duration": 53348326117, "load_duration": 18347472, "prompt_eval_count": 31, "prompt_eval_duration": 163546000, "eval_count": 320, "eval_duration": 53123753000 }
For more information on how to use the parameter, please refer to: generate-a-completion.
Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using "stream": false. The final response object will include statistics and additional data from the request.
POST /api/chat
3.1 Parameters
model: (required) the model name
messages: the messages of the chat, this can be used to keep a chat memory
tools: tools for the model to use if supported. Requires stream to be set to false
The message object has the following fields:
role: the role of the message, either system, user, assistant, or tool
content: the content of the message
images (optional): a list of images to include in the message (for multimodal models such as llava)
tool_calls (optional): a list of tools the model wants to use
3.2 Examples - Chat Request (Streaming)
Request:
Send a chat message with a streaming response.
curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "why is the sky blue?" } ] }'
Response:
A stream of JSON objects is returned. Final response:
{ "model": "llama3.2", "created_at": "2023-08-04T19:22:45.499127Z", "done": true, "total_duration": 4883583458, "load_duration": 1334875, "prompt_eval_count": 26, "prompt_eval_duration": 342546000, "eval_count": 282, "eval_duration": 4535599000 }
List models that are available locally.
GET /api/tags
Examples
Request:
curl http://localhost:11434/api/tags
Response:
A single JSON object will be returned.
{ "models": [ { "name": "codellama:13b", "modified_at": "2023-11-04T14:56:49.277302595-07:00", "size": 7365960935, "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697", "details": { "format": "gguf", "family": "llama", "families": null, "parameter_size": "13B", "quantization_level": "Q4_0" } }, { "name": "llama3:latest", "modified_at": "2023-12-07T09:32:18.757212583-08:00", "size": 3825819519, "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e", "details": { "format": "gguf", "family": "llama", "families": null, "parameter_size": "7B", "quantization_level": "Q4_0" } } ] }
Show information about a model including details, modelfile, template, parameters, license, system prompt.
POST /api/show
5.1 Parameters
name: name of the model to show
verbose: (optional) if set to true, returns full data for verbose response fields
5.2 Examples
Request:
curl http://localhost:11434/api/show -d '{ "name": "llama3.2" }'
Response:
A single JSON object will be returned.
{ "modelfile": "# Modelfile generated by ……", …… }, "model_info": { "general.architecture": "llama", …… } }
Copy a model. Creates a model with another name from an existing model.
POST /api/copy
Examples
Request:
curl http://localhost:11434/api/copy -d '{ "source": "llama3.2", "destination": "llama3-backup" }'
Response:
Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't exist.
Delete a model and its data.
DELETE /api/delete
7.1 Parameters
name: model name to delete
7.2 Examples
Request:
curl -X DELETE http://localhost:11434/api/delete -d '{ "name": "llama3:13b" }'
Response:
Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't exist.
Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
POST /api/pull
8.1 Parameters
name: name of the model to pull
insecure: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects
8.2 Examples
Request:
curl http://localhost:11434/api/pull -d '{ "name": "llama3.2" }'
Response:
If stream is not specified, or set to true, a stream of JSON objects is returned. If stream is set to false, then the response is a single JSON object:
{ "status": "success" }
List models that are currently loaded into memory.
GET /api/ps
Examples
Request:
curl http://localhost:11434/api/ps
Response:
A single JSON object will be returned.
{ "models": [ { "name": "mistral:latest", "model": "mistral:latest", "size": 5137025024, "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8", "details": { "parent_model": "", "format": "gguf", "family": "llama", "families": [ "llama" ], "parameter_size": "7.2B", "quantization_level": "Q4_0" }, "expires_at": "2024-06-04T14:38:31.83753-07:00", "size_vram": 5137025024 } ] }