If you're interested in the world of AI image generation, you may have heard of Stable Diffusion, a product from Stability.AI and powerful tool for generating unique images based on text or image prompts. Recently, they released their latest XL version, which is very powerful in terms of prompt engineering to generate beautiful, high quality images. Along with their newest version, they also have released a rest API for students and developers to use, allowing ease of integration into an app you might be building. The number one question on my mind looking into this was: do I need a dedicated CPU and a ton of storage on my local machine to use this API? The answer is nope! The API allows you access to Stable Diffusion's Models without having to host them on your local machine! In this blog post, I will share my journey on stumbling across the API, and how I plan to use it in my full-stack project I am building at Flatiron School.
I use a MacBook Air with an M1 chip. When I first started looking into Stable Diffusion I downloaded automatic1111 in order to run a simple drawing app to ai image generation program a friend had recommended. After getting Stable Diffusion's models onto my machine, and the source code for automatic1111 (which I had to integrate SD models into) and running in the background of my machine, I literally had NO STORAGE LEFT to download the paint program! Determined to find a better way, I came across Google Colab, a sweet service for running python code with Googles amazing CPU's. StableDiffusion had a Juypter Notebook for their source code, so I could easily open it up in Colab. It ran great, and fast (maybe took 3-5 seconds to generate an image). However, I am looking for a way to build an app that has Stable Diffusion integrated in, and using Google Colab seemed like maybe it was possible, but not necessarily practical (if I am wrong on that, please let me know!)
Feeling relatively defeated yet still optimistic, I kept digging for a solution. I found an excellent tutorial from Skolo Online and she walks you through how to fetch the models from the API using Python. She says right at the beginning that her computer is slow, so apologies if it takes a minute for the code to process. I followed her step by step guidance and generated an image from my command line using a prefabricated prompt. The generation time was a matter of seconds and the image quality is astounding.
Prompt: create a high resolution picture image of a luxary car in a studio setting showcasing its sleek lines and high-end features. perfect lighting with highlights.
Note: Yes, I spelled luxury wrong in the prompt I entered for this image. I wanted to show you exactly what I told it in order to generate the above image. It can function with typos, what a plus for a not-so-great-speller as myself.
If you know python, the simplicity of incorporating these models into your code is astonishingly simple. If you feel confident running the python script flying solo, I will give you a few steps to get you started. However, if you want a hand-hold like I did, go ahead to the tutorial above. It takes about an hour to get through but it is worth it.
Step 1: Go to DreamStudio and make an account. They start you off with some credits, which equal about 150 images to generate. Play with the software a little, this is the FUN PART! Get familiar with the left tool-bar on the page: there is a prompt section where you engineer a text prompt the ai uses to generate it's images. (Tip, if you want an easy way to make a prompt, head to chatGPT and ask it to engineer a creative prompt for text to image ai generation.)
Take a look at the advanced settings: there is something called steps; the higher the steps, the better the image quality, the higher the cost to run the program. This is the downside to this api: you need to pay for it. But in reality, it is much cheaper than others, like DALL-E, coming in at about $10 for around 1000 images. If you had the CPU and memory to download stable diffusion and use the models directly on your machine, that would be the way to go to get SD "for free". However, if you are like me with a limited machine, this is the price we pay to play around with the tech. Not bad. (Note, I am unsure if SDXL, the latest model we are talking about here, is available for download. I ended up playing with v1.5 (which produces significantly less quality images) on Google Colab. Please let me know if XL is available and you have tried it either locally or with Colab.)
There is also a dropdown called Model in the advanced setting. These are the different versions of SD that are available. You can play around and test the same prompt in different models; do the results differ?
Step 2: In your personal account info in DreamStudio, there is an API key autogenerated for you. Click on the documentation tab below that and it will take you where you need to find the code to get you started.
Step 3: Here is a link to the getting started page for Stability.Ai's REST API. It is very comprehensive. They basically provide the python code necessary for gaining access to the API. Here is my source code that I ended up with, plus I have a config.py file that contains my API key, which I will then store in a dotenv file and then to a .gitignore as I build out my project. I know I need this often, so remember guys to not share your API code with the world. Ok, source code:
import os
import requests
import config
import base64
import json
api_host = 'https://api.stability.ai'
api_key = config.api_key
engine_id='stable-diffusion-xl-beta-v2-2-2'
if api_key is None:
raise Exception("Missing Stability API key.")
def getModelList():
url = f"{api_host}/v1/engines/list"
response = requests.get(url, headers={
"Authorization": f"Bearer {api_key}"
})
if response.status_code != 200:
raise Exception("Non-200 response: " + str(response.text))
# Do something with the payload...
payload = response.json()
print(payload)
def generateStableDiffusionImage(prompt, height, width, steps):
url= f"{api_host}/v1/generation/{engine_id}/text-to-image"
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload={}
payload['text_prompts'] = [{"text": f"{prompt}"}]
payload['cfg_scale'] = 7
payload['clip_guidance_preset'] = 'FAST_BLUE'
payload['height'] = height
payload['width'] = width
payload['samples'] = 1
payload['steps'] = steps
response = requests.post(url,headers=headers,json=payload)
#processing the response
if response.status_code != 200:
raise Exception("Non-200 response: " + str(response.text))
data = response.json()
# print(data)
for i, image in enumerate(data["artifacts"]):
with open(f"v1_txt2img_{i}.png", "wb") as f:
f.write(base64.b64decode(image["base64"]))
for i, image in enumerate(data["artifacts"]):
with open(f"v1_txt2img_{i}.png", "wb") as f:
f.write(base64.b64decode(image["base64"]))
I ran my code in the python shell from my base directory, feeding the shell my code line by line, then establishing the variables of prompt, width (512, or whatever dimensions you desire), height (512) and steps (50, remember, more steps, better images and more expensive). Note that the engine_id is the current model you would like to use (I want to use the beta model with XL) and the very few last lines of code are responsible for taking the image as a string (it is a really, really, really long string) and converting it into a PNG image. My image saved directly in the base code in my directory, and I was able to see it that way.
In conclusion, I am so excited to explore this API more and integrate it into my app. Not exactly sure what I will make with it, but the possibilities are extensive! Just know that you can implement this api and use the models from Stable Diffusion in order to have a wonderful image generator inside you app. I hope this post was a spark to a fire that I have found and you feel inspired by the accessibility of this API. Think of different ways you might be able to use this tech: a simple text to image generator would be sweet, but what if you could use python to make a simple drawing, and as you sketch the API recognizes your sketch (plus a prompt) and generates an AI image? What if you use their image to image software and create a totally new image with a fed prompt? There is also an in-painting option; what type of image could you describe with those algorithms?
So happy you were here, if there is anything you think I should know let me know and most of all, Happy Coding!
Top comments (0)