DEV Community

Iñigo Etxaniz
Iñigo Etxaniz

Posted on

Running Ollama in a Container Without Internet Access

Prerequisites

This setup has been tested on Ubuntu. To enable GPU acceleration, install the NVIDIA Container Toolkit before running the project:

sudo apt install nvidia-container-toolkit
sudo systemctl restart docker
Enter fullscreen mode Exit fullscreen mode

Motivation

With the growing demand for AI-powered applications, running large language models locally is becoming increasingly common. However, in some cases, it is essential to ensure confidentiality and prevent unintended data leaks by running Ollama in a fully sandboxed environment without internet access. This setup allows organizations or individuals to maintain full control over their AI models while still leveraging their capabilities.

Additionally, this setup enables fine-grained control over networking, allowing a dedicated model-downloading instance to have internet access while keeping the model-serving instance completely isolated.

Project Overview

This project provides a Docker-based solution to run Ollama models in an isolated environment while keeping the model-downloading instance in a normal network. The architecture consists of:

  • Ollama Runner: Runs AI models in a sandboxed network without internet access.
  • Ollama Updater: Responsible for downloading models from the internet and sharing them with the runner.
  • Nginx Reverse Proxy: Bridges the network gap, allowing access to the runner while maintaining isolation.

Tested with VS Code's Continue Extension

To ensure practical usability, I tested this configuration with the Continue extension in Visual Studio Code, and it worked correctly. This means developers can integrate the local Ollama instance seamlessly with their workflow while keeping the model execution isolated.

Demonstrating Network Isolation

One of the key aspects of this project is network isolation, ensuring that the Ollama Runner does not have internet access while the Ollama Updater does. To verify this, we can run the following script:

bash network-isolation-test-script.sh
Enter fullscreen mode Exit fullscreen mode

This script performs the following tests:

  • Check internet access:

    • The ollama-updater container should be able to reach google.com.
    • The ollama-runner container should be fully isolated and unable to reach the internet.
  • Check Ollama API accessibility:

    • The ollama-runner should be able to access its API internally.
    • The ollama-updater should be able to access its own API.

Expected output:

✓ ollama-updater network has internet access (Expected)
✓ ollama-runner network is properly isolated (Expected)
✓ Ollama API accessible from runner network
✓ Ollama API accessible from updater network
Enter fullscreen mode Exit fullscreen mode

This confirms that the model execution environment remains secure and offline, while updates can still be managed efficiently.

Technical Setup

Key Features

  • Fully sandboxed Ollama model execution
  • Separate container for downloading models with internet access
  • Nginx reverse proxy for controlled access
  • Custom small Docker networks (/28) to ensure isolation
  • GPU acceleration via NVIDIA toolkit (if available)

Deployment Steps

  1. Install Docker and NVIDIA Container Toolkit (if using GPU).
  2. Clone the repository:
   git clone https://github.com/ietxaniz/ollama-local.git
   cd ollama-local
Enter fullscreen mode Exit fullscreen mode
  1. Start the services:
   docker-compose up -d
Enter fullscreen mode Exit fullscreen mode
  1. Verify running containers:
   docker ps
Enter fullscreen mode Exit fullscreen mode
  1. Use the Ollama Runner to serve models, while the Ollama Updater manages downloads.

Managing Models

To list available models:

docker exec -it ollama-runner ollama ls
Enter fullscreen mode Exit fullscreen mode

To pull new models:

docker exec -it ollama-updater ollama pull deepseek-r1:14b
Enter fullscreen mode Exit fullscreen mode

To check active models:

docker exec -it ollama-runner ollama ps
Enter fullscreen mode Exit fullscreen mode

Future Possibilities

While this project provides a robust foundation for securely running Ollama, I am considering extending it further to explore Retrieval-Augmented Generation (RAG). This could enhance local AI capabilities by integrating external knowledge bases while keeping execution sandboxed.

If you're interested in contributing or have suggestions, feel free to open an issue in the GitHub repository.

Conclusion

This setup ensures that AI models can run securely in an isolated environment while maintaining the flexibility to update and manage them efficiently. Whether for security reasons or simply to experiment with self-hosted AI, this approach provides a reliable solution.

Would you like to see more extensions of this project? Let me know in the comments!

Top comments (0)