DEV Community

Cover image for How to Install and Run QwQ-32B Locally on Windows, macOS, and Linux
Amdadul Haque Milon
Amdadul Haque Milon

Posted on

How to Install and Run QwQ-32B Locally on Windows, macOS, and Linux

Imagine having a powerful AI model running on your own computer — no endless API calls, no cloud bills, and best of all, complete privacy for your sensitive data. With Alibaba’s QwQ-32B, you can bring enterprise-grade AI right to your desk. In this guide, I’ll walk you through installing and running QwQ-32B locally on Windows, macOS, and Linux. Plus, I’ll show you how the process is nearly identical for any model available on Ollama, making it super versatile. And if you’re curious to explore not only QwQ-32B but also other groundbreaking models like DeepSeek-R1, GPT-4o, and Clause 3.7, you can check them out on Anakin AI — a one-stop hub for all things AI.

Why Run QwQ-32B Locally?

Before diving into the nuts and bolts, let’s quickly talk about why you might want to run QwQ-32B on your own hardware:

  • Privacy: Keep all your data on your computer. No need to worry about sending sensitive info to a cloud service.
  • Cost Savings: With local installations, you bypass recurring API costs. QwQ-32B runs for as little as $0.25 per million tokens compared to much higher cloud costs.
  • Customization: Fine-tune the model with your own datasets and tweak it for your unique needs.
  • Flexibility: Switch between different models — like Llama 3, Mistol, and more — using the same simple process.

Running QwQ-32B locally gives you full control over the model, and the setup process is surprisingly beginner-friendly. Even if you’ve never opened a Terminal before, you can get this up and running in about 10 minutes!

Hardware Requirements for QwQ-32B

Running QwQ-32B locally demands robust hardware to ensure smooth installation and efficient inference. Below are the minimum requirements for each platform:

Mac

  • Processor: Apple Silicon — M1 Pro or M1 Max is recommended for optimal performance.
  • RAM: Minimum of 48GB. (Ideal for larger contexts: systems with 48GB+ unified memory offer even better performance.)
  • Storage: Sufficient free disk space (at least 100GB recommended for model files and additional data).

Windows

  • Processor: Modern multi-core CPU with AVX2/AVX512 support. GPU: For quantized versions: NVIDIA GeForce RTX 3060 (12GB VRAM) or higher.
  • For full precision inference: NVIDIA RTX 4090 (24GB VRAM) is recommended.
  • RAM: At least 32GB for smooth operation. Storage: Minimum of 100GB free space for model files and related resources.

Linux

  • Processor: Multi-core CPU with AVX2/AVX512 support. ARM chips are also compatible.
  • GPU: For quantized versions: NVIDIA RTX 3090 or RTX 4090 (24GB VRAM) is sufficient. For larger contexts or higher precision settings, GPUs like the NVIDIA A6000 are recommended.
  • RAM: Minimum of 32GB.
  • Storage: At least 100GB of free space for model storage.

How to Install QwQ-32B on Windows

How to Install QwQ-32B on Windows

Step 1: Download and Install Ollama

The first step is to download Ollama — a free software that makes local AI installations a breeze. Here’s how:

  • Visit ollama.com and click on the download button for Windows.
  • Run the downloaded .exe file. No admin rights needed.
  • Follow the on-screen instructions to install Ollama. It might ask you to type your computer’s passcode; that’s normal. ### Step 2: Open the Command prompt Next, open the Command prompt on your Windows machine. You can do this by searching for “Command prompt” in your Start menu. This may seem a bit technical, but don’t worry — just follow along.

Step 3: Install Your Chosen Model

With Ollama installed, you can now install QwQ-32B. In the Terminal, type the command:

ollama run qwq
Enter fullscreen mode Exit fullscreen mode

This command tells Ollama to run the full precision (FP16) version of QwQ-32B.

After pressing enter, the model will begin its installation. This may take a few seconds to a few minutes. Once installed, you can test it by asking a simple question like:

> What’s the integral of x² from 0 to 5?

Enter fullscreen mode Exit fullscreen mode

The Terminal should display the answer, proving your model is up and running.

How to Install QwQ-32B on macOS

How to Install QwQ-32B on macOS

Step 1: Terminal Installation via Shell Script

Mac users, especially those with Apple Silicon, have a similar process. Open the Terminal and run:

ollama run qwq
Enter fullscreen mode Exit fullscreen mode

This script installs Ollama on your macOS. Follow any prompts that appear during the installation.

Step 2: Testing the Model
After installation, test your setup by entering a query in Terminal:

What’s your name?

You should receive an answer from the model, confirming that everything is working as expected.

How to Install QwQ-32B on Linux

How to Install QwQ-32B on Linux

To install and run the QwQ-32B model through Ollama on Linux, follow these steps:

Step 1: Install Ollama

Ollama simplifies the setup process for running advanced AI models like QwQ-32B. Use the following command to install it:

curl -fsSL https://ollama.com/install.sh | sh

Enter fullscreen mode Exit fullscreen mode

Step 2: After installation, verify that Ollama is installed by running: ollama

Step 3: Pull the QwQ-32B Model

Use Ollama to download the QwQ-32B model. Run the following command:

ollama pull qwq:32b

Enter fullscreen mode Exit fullscreen mode

This will fetch the quantized version of QwQ-32B optimized for efficient inference.

Step 4. Run the Model

Once the model is downloaded, you can interact with it directly in the terminal. Use this command to start running the model:

ollama run qwq:32b

Enter fullscreen mode Exit fullscreen mode

Optional But Recommended: Setting Up a Web Interface with Docker

If you’d prefer a graphical interface similar to ChatGPT rather than using the command line, you can set up a web UI using Docker. This approach is slightly more technical but only needs to be done once.

Step 1: Install Docker Desktop

Download and install Docker Desktop from Docker’s website. Once it is installed, leave it as it is.

Step 2: Install open webUI

So we will move on to the last step, we will install Open webUI. Open webUI is an open source web interface for AI models. It supports multi-modal AI integration, self-hosting for privacy and API connectivity for customization, the user can chat, save history, and modify the UI for a smoother experience.

To install Open WebUI run this code in your terminal:

docker run -d -p 8080:8080 — gpus all -v ollama:/root/.ollama -v open-webui:/app/backend/data — name open-webui — restart always ghcr.io/open-webui/open-webui:main

Enter fullscreen mode Exit fullscreen mode

This command pulls the container, sets up GPU access, and maps necessary volumes. Once completed, open your web browser and navigate to http://localhost:8080. You’ll see a ChatGPT-like interface where you can interact with your local model.

Cloud Alternative for Underpowered Hardware

If your computer doesn’t meet the required specs, consider a cloud alternative. For example, NodeShift offers GPU instances:

  • Sign Up at NodeShift and create an account.
  • Launch a GPU Instance with an A100 or A6000 GPU.
  • Install QwQ-32B Using the Auto-Installer:
curl -sL nodeshift.com/qwq32b-install | bash

Enter fullscreen mode Exit fullscreen mode

This sets up QwQ-32B on a cloud instance, allowing you to bypass hardware limitations while still enjoying local-like control.

Fine-Tuning and Customization

Once your model is operational, you can fine-tune it to suit your needs. For instance, you can create a custom version of QwQ-32B with your own dataset:

ollama create qwq-custom -f Modelfile

Enter fullscreen mode Exit fullscreen mode

For additional guidance, explore Alibaba’s official Hugging Face repository where you’ll find sample configurations and community contributions.

Bringing It All Together

Running QwQ-32B locally is more than a technical exercise — it’s a gateway to harnessing enterprise-grade AI on your own hardware. This guide covered the basics for Windows, macOS, and Linux, along with tips on setting up a web interface and even cloud alternatives for those without high-end hardware.

Imagine the freedom of being able to run AI models offline, privately analyzing your own documents, and experimenting with different models all from your local machine. And remember, the same simple process can be used to install any model available on Ollama. Whether you’re working with QwQ-32B, Llama 3, Mistol, or any other model, the steps remain remarkably similar.

If you’re eager to try out these exciting possibilities, don’t forget to explore Anakin AI. With access to a whole suite of advanced models like QwQ-32B, DeepSeek-R1, GPT-4o, Clause 3.7, and more, Anakin AI is your ultimate hub for cutting-edge AI innovation.

A Final Word: Embrace the Power of Local AI

As we move deeper into 2025, the landscape of AI is evolving rapidly. Running models like QwQ-32B locally empowers you with privacy, cost savings, and the freedom to innovate without limitations. Whether you’re a seasoned developer or just starting out, setting up your own local AI environment opens up a world of creative possibilities.

So why wait? Take the leap, follow this guide, and install QwQ-32B on your computer today. And if you’re curious to explore an even wider range of AI models, Anakin AI awaits — with a treasure trove of powerful tools ready to transform your ideas into reality.

Happy experimenting, and here’s to a future where advanced AI is accessible to all — right from the comfort of your own home!

Top comments (0)