Buhake Sindi

Posted on Mar 7

How to run Large Language Models (LLMs) locally.

#ai #llm #deepseek #chatgpt

Whether you are a developer or an AI enthusiast, you might be glad to know that you can run Large Language Models locally.

Benefit of running LLM locally.

There might be various reasons to run LLMs locally:

Privacy: Running LLM locally ensures that sensitive data is controlled by the user, which might be a prerequisite requirement by data privacy regulators
1. Reduced Latency: Since data are not send to LLMs in the cloud, the response time are generally faster (assuming you're running it on good, strong hardware) as the response time are reduced compared to its cloud-based counterpart.
2. Cost efficiency: Should you have a decent hardware to run LLMs locally the cost of high usage costs are totally reduced.

This will require a good, capable hardware with enough storage space to run LLM locally.
If you're running it on a normal PC, you will need a GPU acceleration enabled PC to run it efficiently.

I'll highlight on popular tools that allows you to run LLMs locally. I've specifically chosen tools that allows developers to run the LLM as a local server so that they can code and run their AI agents locally via APIs. They can, then, switch back to cloud-based LLMs later (by changing the endpoint and providing their API keys).

For each of the tools, we'll download and play with the open-source DeepSeek R1 model.

Ollama with Chatbox AI client

Ollama is a lightweight tool designed to download open-source large language models (LLMs) directly on your computer. It also exposes an API to create, modify and/or provide a REST API for running and managing models too.

The Ollama library provides a list of models supported by Ollama.

Requirement: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Download: To download Ollama on your local PC on Windows, MacOS, Linux.
Source code: Ollama on GitHub.

Getting started.

Download and install the latest release of Ollama on your PC.
Download and install Chatbox AI client.
Once Ollama is fully installed, open the console, command line, PowerShell or terminal (depending on your operating system of choice). You will notice a white ollama "llama" logo installed on the notification tray if you're running it on Windows.
Enter the command ollama --help to see the available ollama commands.
Go to the Ollama library. Click on the model version tag (in our case 8b) and copy the any of the command of the model listed and paste it on your console/terminal on local machine: > ollama run deepseek-r1:8b
Once the download is complete, the model will be running locally. You will see the message >>> Send a message (/? for help) showing that the model is running.
To prompt the model from the command line/console, enter your message in double quotes. Example: > >>> "What model are you?" > <think> > > </think> > > Hi! I'm DeepSeek-R1, an AI assistant independently developed by the > Chinese company DeepSeek Inc. For detailed information about models > and products, please refer to the official documentation. > > >>> Send a message (/? for help) >
To terminate the model session on your command line/terminal, enter /bye.
Once you're running a model locally, open Chatbox AI client.
On settings (bottom left, select the OLLAMA API as the "Model Provider" and click "Save".
Select the deepseek-r1:8bmodel next to the blue send button.
Now, you can chat away! (where it says "Type your question here...") 😉

For developers: you can interface your installed LLM using an OpenAI client (base URI: http://localhost:11434 and model deepseek-r1:8b).

Jan.ai 👋

Jan is an open-source AI assistant that runs 100% offline, locally on your computer. It also uses a local API server (Cortex Server) to run an OpenAI-equivalent API locally.

Requirement: PC with Nvidia GPU support (for better performance).
Download: To download Jan.ai on your local PC on Windows, Linux and MacOS.
Source code: Jan.ai on GitHub.

Getting started.

Download and install the latest release of Jan.ai on your PC.
You will need to go to HuggingFace to select the model GGUF (GPT-Generated Unified Format) link. In this case, we'll use unsloth/DeepSeek-R1-GGUF link (https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF).
Open Jan, once it is installed. In settings (⚙), select "My Models" and paste the GGUF link on the search (🔍) bar and choose the model you wish to run.
Once the GGUF model download is complete, select "Thread", and select the downloaded model from the "Select a model" button.
Now, you can chat away! (where it says "Ask me anything") 😉

For developers: you can interface your installed LLM using an OpenAI client (base uri: https://localhost:1337).

LM Studio

LM Studio is an open-source, free desktop application designed to simplify the installation and usage of open-source Large Language Models (LLMs) locally on users' computers. The LM Studio GUI app is not open source. However LM Studio‘s CLI lms, Core SDK, and our MLX inferencing engine are all MIT licensed and open source. You can run any compatible Large Language Model (LLM) from Hugging Face, both in GGUF (llama.cpp) format, as well as in the MLX format (Mac only). You can run GGUF text embedding models. Some models might not be supported, while others might be too large to run on your machine. Image generation models are not yet supported.

You can also run an Open-AI like HTTP server on your local machine (localhost).

Requirement: if you have 16GB of RAM, you can run the 7B or 8B parameter distilled models. If you have ~192GB+ of RAM, you can run the full 671B parameter model.
Download: You can download LM Studio from here.

Getting started.

Download and install the latest release of LM Studio.
Select the Discover (🔍) on the sidebar and search for DeepSeek-R1 on the search bar. A list of all GGUF files will be displayed (it searches from HuggingFace for you).
Download the model that suits your PC requirements.
Once the model is complete, go to chat (💬) icon and select your DeepSeek-R1 model (top of the chat) and start asking away! 😉
For more details, you can read this blog here.

For developers: you can interface your installed LLM using an OpenAI client (base URI: http://localhost:1234/v1/).

Conclusion

There hasn't been a better time to start playing and experiencing LLMs without worrying about data privacy and the cost implications of cloud-based LLMs.

For developers and enthusiasts, using LLM locally can have practical applications, such as building AI agents, and be able to create custom/fine-tune models, all at a reduced cost.

DEV Community

How to run Large Language Models (LLMs) locally.

Benefit of running LLM locally.

Ollama with Chatbox AI client

Getting started.

Jan.ai 👋

Getting started.

LM Studio

Getting started.

Conclusion

Top comments (0)

Read next

LivinGrimoire: The Skill Crafter - A Game Concept

🚀 Revolutionizing DevOps Debugging with DevOps-GPT & OpenAI!🚀

Building a Simple Powerful Chatbot Using ChromaDB and Sentence-Transformers

The AI Cold War: A New Era of Technological Rivalry