DEV Community

Cover image for Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!
Alexander Uspenskiy
Alexander Uspenskiy

Posted on

Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!

I've inspected the latest response from Mistral: Mistral-Small-24B-Instruct. It is bigger, slower than deepseek-ai/deepseek-r1-distill-qwen-7b but it also showing how it is thinking and doesn't send your sensitive data to China soil :)

So let's start.

This project provides an interactive chat interface for the mistralai/Mistral-Small-24B-Instruct-2501 model using PyTorch and the Hugging Face Transformers library.

Requirements

  • Python 3.8+
  • PyTorch
  • Transformers
  • An Apple Silicon device (optional, for MPS support)

Setup
Clone the repository:

git clone https://github.com/alexander-uspenskiy/mistral.git
cd mistral
Enter fullscreen mode Exit fullscreen mode

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
Enter fullscreen mode Exit fullscreen mode

Install the required packages:

pip install torch transformers
Enter fullscreen mode Exit fullscreen mode

Set your Hugging Face Hub token:

export HUGGINGFACE_HUB_TOKEN=your_token_here
Enter fullscreen mode Exit fullscreen mode

Usage

Run the chat interface:

python mistral.py
Enter fullscreen mode Exit fullscreen mode

Features

  • Interactive chat interface with the Mistral-Small-24B-Base-2501 model.
  • Progress indicator while generating responses.
  • Supports Apple Silicon GPU (MPS) for faster inference.

Code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import time
import threading

# Check if MPS (Apple Silicon GPU) is available
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

# Load the Mistral-Small-24B-Base-2501 model
model_name = "mistralai/Mistral-Small-24B-Instruct-2501"
token = os.getenv("HUGGINGFACE_HUB_TOKEN")

tokenizer = AutoTokenizer.from_pretrained(model_name, token=token)


model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map={"": device},
    torch_dtype=torch.float16,  # Optimized for M1 GPU
    token=token
)

def show_progress():
    while not stop_event.is_set():
        for char in "|/-\\":
            print(f"\rGenerating response {char}", end="", flush=True)
            time.sleep(0.1)

# Interactive terminal loop
print("mistralai/Mistral-Small-24B-Instruct-2501 Chat Interface (type 'exit' to quit)")
interaction_count = 0  # Initialize counter
while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break

    inputs = tokenizer(user_input, return_tensors="pt").to(device)

    stop_event = threading.Event()
    progress_thread = threading.Thread(target=show_progress)
    progress_thread.start()

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=200,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id  # Added to avoid warning
        )

    stop_event.set()
    progress_thread.join()
    print("\r", end="")  # Clear the progress line

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Mistral: {response}\n")
Enter fullscreen mode Exit fullscreen mode

Example:

You: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs                         
Mistral: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs?
Let's denote the cost of the ball as \( B \) and the cost of the bat as \( B + 1.00 \).

According to the problem, the total cost of the bat and the ball is $1.10. Therefore, we can write the equation:

\[ B + (B + 1.00) = 1.10 \]

Simplifying the equation:

\[ 2B + 1.00 = 1.10 \]

Subtract 1.00 from both sides:

\[ 2B = 0.10 \]

Divide both sides by 2:

\[ B = 0.05 \]

So, the ball costs $0.05.
Enter fullscreen mode Exit fullscreen mode

Summary:

So as you see modern models are able to run locally and solve logical tasks with the excellent performance.

Happy coding!

Top comments (0)