How to Run LLMs Locally with LM Studio



Introdcution of LM Studio

What is LM Studio?

LM Studio is an open-source, free desktop application designed to simplify the installation and usage of open-source Large Language Models (LLMs) locally on users' computers. With LM Studio, individuals can easily access and utilize various LLMs without requiring extensive computational knowledge, such as managing commands within a terminal or complex Web User Interfaces (WebUIs). By providing a user-friendly interface, LM Studio enables users to explore the capabilities of open-source LLMs, including models like Llama 2, Vicuna, Mistral, and others, while enjoying benefits such as offline operation, increased privacy, and experimental opportunities.

What You Can Do with LM Studio?

- Run LLMs on your PC and laptop, entirely offline

- Use models through the in-app Chat UI or an OpenAI compatible local server

- Download any compatible model files from HuggingFace repositories

- Discover new & noteworthy LLMs in the app's home page

- Evaluate and fine-tune your model: Once your model is trained, you can evaluate its performance on a test set to see how well it is doing.

- LM Studio provides a range of pre-trained models and architectures to get you started, or you can create your own custom model.

- LM Studio supports any ggml Llama, MPT, and StarCoder model on Hugging Face (Llama 2, Orca, Vicuna, Nous Hermes, WizardCoder, MPT, etc.)

Can LM studio use GPU?

Yes, LM Studio provides options to enable GPU acceleration for improved inference speed. LM Studio supports NVIDIA/AMD GPUs, 6GB+ of VRAM is recommended.

Does LM Studio collect data?

Privacy is one of the core values behind LM Studio. No data is collected, monitored, or stored outside your local machine. It's also free for personal use.

Does LM Studio have an API?

A notable feature of LM Studio is the ability to create Local Inference Servers with just a click. The Automatic Prompt Formatting option simplifies prompt construction to match the model's expected format. The exposed API aligns with the OpenAI format.

System Requirements

To run LM Studio, a user-friendly tool for working with Large Language Models (LLMs) locally, the following system requirements apply:

Mac:

- Apple Silicon Mac (M1/M2/M3) with macOS 13.6 or later

Windows & Linux (Beta):

- Processor supporting Advanced Vector Extensions 2 (AVX2) instructions

- 16GB+ of RAM is recommended. For PCs, 6GB+ of VRAM is recommended

- NVIDIA or AMD GPUs supported, with at least 6 GB of Video RAM (VRAM) being advantageous

Please note that although LM Studio does not require an active internet connection during normal operations, an initial internet connection might be needed for downloading models from sources like Hugging Face.

How to Install and Use LM Studio?

To get started with LM Studio, follow these steps:

Step 1. Visit the official LM Studio website to download the appropriate installer for your operating system (Mac, Windows, or Linux)

Step 2. After downloading the approximately 400 MB package, proceed with the installation according to your OS's standard procedure.

Step 3. Launch LM Studio after successful installation.

Step 4. You can search for different models and their details, including formats and quantization levels, directly within the LM Studio interface without having to visit external websites like Hugging Face. Select a model to download based on your GPU VRAM. Keep in mind that larger models may take longer to download due to their size.

Note: LM Studio allows you to manage and delete downloaded models, change storage location, and serve models through an API.

Step 5. Once the desired model is downloaded, click the speech bubble icon on the left panel and select the loaded model to begin interaction.

LM Studio chat with your preferred LLM models

You can create new chats, view the time it took for each token, and customize options using the toggle setting sidebar in LM Studio.

Step 6. There you have it, that quick and simple to set up an LLM locally. If you would like to speed up the response time, you can do so by enabling the GPU acceleration on the right-hand side. You can adjust parameters like model loading, batch size, and context length, and select the GPU hardware settings for running local LLMs.

Additionally

LM Studio supports models compatible with the ggml tensor library from the llama.cpp project, enabling integration with various LLMs from the Hugging Face repository. The current version of LM Studio is not open source, but it remains a helpful tool for those seeking to manage and interact with LLMs on their local machines, offering features such as local servers and automatic prompt formatting for seamless integration with frontends or workflow solutions.

The app does not collect data nor monitor your actions. Your data stays local on your machine. It's free for personal use. For business use, please get in touch. Enjoy exploring the world of Large Language Models with LM Studio!

How to Run LLMs with LM Studio