DEV Community

Cover image for Unlock AI-Powered Image Processing on Your Laptop with Stable Diffusion v1.5 – It’s Easier Than You Think!
Alexander Uspenskiy
Alexander Uspenskiy

Posted on

Unlock AI-Powered Image Processing on Your Laptop with Stable Diffusion v1.5 – It’s Easier Than You Think!

This script leverages Stable Diffusion v1.5 from Hugging Face's Diffusers library to generate image variations based on a given text prompt. By using torch and PIL, it processes an input image, applies AI-driven transformations, and saves the results.

You can clone this repo to get the code https://github.com/alexander-uspenskiy/image_variations

Source code:

import torch
from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image
import requests
from io import BytesIO

def load_image(image_path, target_size=(768, 768)):
    """
    Load and preprocess the input image
    """
    if image_path.startswith('http'):
        response = requests.get(image_path)
        image = Image.open(BytesIO(response.content))
    else:
        image = Image.open(image_path)

    # Resize and preserve aspect ratio
    image = image.convert("RGB")
    image.thumbnail(target_size, Image.Resampling.LANCZOS)

    # Create new image with padding to reach target size
    new_image = Image.new("RGB", target_size, (255, 255, 255))
    new_image.paste(image, ((target_size[0] - image.size[0]) // 2,
                           (target_size[1] - image.size[1]) // 2))

    return new_image

def generate_image_variation(
    input_image_path,
    prompt,
    model_id="stable-diffusion-v1-5/stable-diffusion-v1-5",
    num_images=1,
    strength=0.75,
    guidance_scale=7.5,
    seed=None
):
    """
    Generate variations of an input image using a specified prompt

    Parameters:
    - input_image_path: Path or URL to the input image
    - prompt: Text prompt to guide the image generation
    - model_id: Hugging Face model ID
    - num_images: Number of variations to generate
    - strength: How much to transform the input image (0-1)
    - guidance_scale: How closely to follow the prompt
    - seed: Random seed for reproducibility

    Returns:
    - List of generated images
    """
    # Set random seed if provided
    if seed is not None:
        torch.manual_seed(seed)

    # Load the model
    device = "cuda" if torch.cuda.is_available() else "cpu"
    pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
        model_id,
        torch_dtype=torch.float16 if device == "cuda" else torch.float32
    ).to(device)

    # Load and preprocess the input image
    init_image = load_image(input_image_path)

    # Generate images
    result = pipe(
        prompt=prompt,
        image=init_image,
        num_images_per_prompt=num_images,
        strength=strength,
        guidance_scale=guidance_scale
    )

    return result.images

def save_generated_images(images, output_prefix="generated"):
    """
    Save the generated images with sequential numbering
    """
    for i, image in enumerate(images):
        image.save(f"images-out/{output_prefix}_{i}.png")

# Example usage
if __name__ == "__main__":
    # Example parameters
    input_image = "images-in/Image_name.jpg"  # or URL
    prompt = "Draw the image in modern art style, photorealistic and detailed."

    # Generate variations
    generated_images = generate_image_variation(
        input_image,
        prompt,
        num_images=3,
        strength=0.75,
        seed=42  # Optional: for reproducibility
    )

    # Save the results
    save_generated_images(generated_images)
Enter fullscreen mode Exit fullscreen mode

How It Works:

Load & Preprocess the Input Image

Accepts both local file paths and URLs.
Converts the image to RGB format and resizes it to 768×768, maintaining aspect ratio.
Adds padding to fit the target size.
Initialize Stable Diffusion v1.5

Loads the model on CUDA (if available) or falls back to CPU.
Uses StableDiffusionImg2ImgPipeline to process the input image.
Generate AI-Modified Image Variations

Takes in a text prompt to guide the transformation.
Parameters like strength (0-1) and guidance scale (higher = stricter prompt adherence) allow customization.
Supports multiple output images per prompt.
Save Results to image-out directory.

Outputs generated images with a sequential naming scheme (generated_0.png, generated_1.png, etc.).

Example Use Case

You can transform an image of a person into a medieval king using a prompt like:
prompt = "Draw this person as a powerful king, photorealistic and detailed, in a medieval setting."

Initial image:

Image description

Result:

Image description

Cons&Pros

Cons:

  • Can be slow on some hardware configurations.
  • Small size model limitations.

Pros:

  • Runs locally (no need for cloud services).
  • Customizable parameters for fine-tuning output.
  • Reproducibility with optional random seed.

Top comments (0)