Running Qwen Locally on a MacBook Pro
Qwen (Qwen GitHub Repository) has been gaining attention recently as a powerful open-source large language model (LLM). I decided to give it a spin on my MacBook Pro using Ollama, a platform designed for running local LLMs. While Qwen2.5-Max boasts the highest performance, my setup could only handle the smaller Qwen2.5 (32B) model. Here's what I found!
Hardware and Software Environment
- Device: MacBook Pro (M3 Pro)
- Memory: 36GB RAM
- Software: Latest version of Ollama
Installing Ollama
To get started, I installed Ollama. With Homebrew, the installation process is straightforward:
brew install ollama
# Verify installation
ollama --version
# Output: ollama version is 0.5.7
Downloading and Running Qwen2.5 (32B)
Once Ollama was set up, I proceeded to download and run the Qwen2.5 (32B) model:
ollama run qwen2.5:32b
# Output during the first run:
pulling manifest
pulling eabc98a9bcbf... 100% ▕█████████████████████████████████████████████████████████████████████████████▏ 19 GB
...
success
The model weighs about 20GB, so depending on your internet speed, downloading may take some time.
Memory Usage Insights
Running the Qwen2.5 (32B) model requires substantial memory. With my MacBook Pro M3 and its 36GB RAM, these were the key observations:
- Memory Usage: About 30-34GB was utilized while running the model. (Baseline system memory before starting: ~15GB.)
- Performance: Each response took around 1–2 minutes, varying based on the complexity of the prompt.
- Extended usage caused the laptop fan to start running, but the overall experience was stable.
Caveats to Keep in Mind
- Large Initial Download: Be prepared for the ~20GB download on the first launch.
- High Memory Usage: Ensure you manage other applications to free up resources.
- Slower Response Without GPU: In setups without a dedicated GPU, inference times can be slower.
Example Task: Round-Trip Train Calculation
To test the model’s reasoning and computation abilities, I used the following prompt:
ollama run qwen2.5:32b
>>> A train travels from city A to city B at 60 km/h and takes 2 hours. On the return journey, due to headwind, it travels at 40 km/h.
... 1) What is the total distance between cities A and B?
... 2) How long does the return journey take?
... 3) What is the average speed for the entire round trip?
... Please show all calculations.
Qwen's Response
Below is the model's detailed response:
### 1) Total Distance Between Cities A and B
Distance = Speed × Time
= 60 km/h × 2 h
= **120 km**
The total distance is 120 km.
### 2) Duration of the Return Journey
Time = Distance ÷ Speed
= 120 km ÷ 40 km/h
= **3 hours**
The return journey takes 3 hours.
### 3) Average Speed for the Entire Round Trip
- Total Distance = 120 km (to B) + 120 km (return) = **240 km**
- Total Time = 2 hours (to B) + 3 hours (return) = **5 hours**
- Average Speed = Total Distance ÷ Total Time
= 240 km ÷ 5 h
= **48 km/h**
The average speed for the round trip is 48 km/h.
I cross-verified the results with Gemini, GPT-4, and Claude 3.5, all of which produced the same answers.
Uninstalling Qwen
If you no longer need the model, it can be removed efficiently using the following command:
ollama rm qwen2.5:32b
# Output: deleted 'qwen2.5:32b'
Summary
Running Qwen2.5 (32B) on a MacBook Pro (M3, 36GB RAM) was possible, albeit with tight memory constraints. I had to close other applications to allocate sufficient resources. Despite the slower response times in a CPU-only environment, Qwen did a commendable job handling complex queries.
Although my benchmark was limited to Qwen2.5 (32B), the official Qwen2.5-Max benchmark suggests its performance is on par with models like DeepSeek-V3 and GPT-4. This makes Qwen an exciting open-source alternative worth watching!
Are you planning to try Qwen on your setup? Let me know your experience in the comments below! 🚀
Top comments (0)