Prerequisites
This setup has been tested on Ubuntu. To enable GPU acceleration, install the NVIDIA Container Toolkit before running the project:
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker
Motivation
With the growing demand for AI-powered applications, running large language models locally is becoming increasingly common. However, in some cases, it is essential to ensure confidentiality and prevent unintended data leaks by running Ollama in a fully sandboxed environment without internet access. This setup allows organizations or individuals to maintain full control over their AI models while still leveraging their capabilities.
Additionally, this setup enables fine-grained control over networking, allowing a dedicated model-downloading instance to have internet access while keeping the model-serving instance completely isolated.
Project Overview
This project provides a Docker-based solution to run Ollama models in an isolated environment while keeping the model-downloading instance in a normal network. The architecture consists of:
- Ollama Runner: Runs AI models in a sandboxed network without internet access.
- Ollama Updater: Responsible for downloading models from the internet and sharing them with the runner.
- Nginx Reverse Proxy: Bridges the network gap, allowing access to the runner while maintaining isolation.
Tested with VS Code's Continue Extension
To ensure practical usability, I tested this configuration with the Continue extension in Visual Studio Code, and it worked correctly. This means developers can integrate the local Ollama instance seamlessly with their workflow while keeping the model execution isolated.
Demonstrating Network Isolation
One of the key aspects of this project is network isolation, ensuring that the Ollama Runner does not have internet access while the Ollama Updater does. To verify this, we can run the following script:
bash network-isolation-test-script.sh
This script performs the following tests:
-
Check internet access:
- The
ollama-updater
container should be able to reachgoogle.com
. - The
ollama-runner
container should be fully isolated and unable to reach the internet.
- The
-
Check Ollama API accessibility:
- The
ollama-runner
should be able to access its API internally. - The
ollama-updater
should be able to access its own API.
- The
Expected output:
✓ ollama-updater network has internet access (Expected)
✓ ollama-runner network is properly isolated (Expected)
✓ Ollama API accessible from runner network
✓ Ollama API accessible from updater network
This confirms that the model execution environment remains secure and offline, while updates can still be managed efficiently.
Technical Setup
Key Features
- Fully sandboxed Ollama model execution
- Separate container for downloading models with internet access
- Nginx reverse proxy for controlled access
-
Custom small Docker networks (
/28
) to ensure isolation - GPU acceleration via NVIDIA toolkit (if available)
Deployment Steps
- Install Docker and NVIDIA Container Toolkit (if using GPU).
- Clone the repository:
git clone https://github.com/ietxaniz/ollama-local.git
cd ollama-local
- Start the services:
docker-compose up -d
- Verify running containers:
docker ps
- Use the Ollama Runner to serve models, while the Ollama Updater manages downloads.
Managing Models
To list available models:
docker exec -it ollama-runner ollama ls
To pull new models:
docker exec -it ollama-updater ollama pull deepseek-r1:14b
To check active models:
docker exec -it ollama-runner ollama ps
Future Possibilities
While this project provides a robust foundation for securely running Ollama, I am considering extending it further to explore Retrieval-Augmented Generation (RAG). This could enhance local AI capabilities by integrating external knowledge bases while keeping execution sandboxed.
If you're interested in contributing or have suggestions, feel free to open an issue in the GitHub repository.
Conclusion
This setup ensures that AI models can run securely in an isolated environment while maintaining the flexibility to update and manage them efficiently. Whether for security reasons or simply to experiment with self-hosted AI, this approach provides a reliable solution.
Would you like to see more extensions of this project? Let me know in the comments!
Top comments (0)