When serving generative image models in a production environment, a tensor representation of an image needs to be converted to a common image format like PNG, JPEG, or WEBP. However this conversion can be costly and those interested in super speedy inference need to know what the fastest way to get a file back to their users.
When making choices about how to deliver these image files, there will be a trade off between inference speed, image quality, and the number of bytes in the file.
In this post, I'll lay out some options to choose from share some benchmarks. All of my benchmarks are taken on a c6a.4xlarge EC2 instance. I use only a single 702x1248 image for each benchmark:
See the appendix for list of versions of packages I'm using in my benchmarks.
Library Options:
- Python Image Library
- Torch Vision
- OpenCV
Formats:
- PNG
- WEBP
I won't consider lossy formats like JPEG because I think diffusion models services shouldn't risk any quality degradation as a result of lossy image compression.
PNG Benchmarks
Here's a sample of the code I used for Python Image Library (PIL). I assume the reader can figure out how to modify to produce all the results I report in the table that comes after the code.
import io
import time
from PIL import Image
import torchvision.transforms.functional as F
path = "/home/user/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")
image_tensor = F.to_tensor(pil_image)
def pil_png(out):
pil_image: Image = F.to_pil_image(image_tensor)
pil_image.save(out, format="PNG", compress_level=4)
t0 = time.time()
for i in range(100):
out = io.BytesIO()
pil_png(out)
print(f"Bytes in file:{len(out.getvalue())}")
print(f"Average time: {(time.time() - t0) / 100}")
Results for Python Image Library:
Options | Time (s) | File Size (bytes) |
---|---|---|
optimize=True | 2.153 | 1066223 |
optimize=False | 0.368 | 1098439 |
compress_level=0 | 0.057 | 2766507 |
compress_level=1 | 0.085 | 1273428 |
compress_level=4 | 0.15 | 1114614 |
compress_level=9 | 2.13 | 1114614 |
I also tested Torch Vision, which has a png encoder. Code for Torch Vision:
import io
import time
from PIL import Image
import torch
import torchvision.transforms.functional as F
import torchvision.io
path = "/home/user/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")
image_tensor = F.to_tensor(pil_image) * 255.0
image_tensor = image_tensor.to(torch.uint8)
def tv_png():
return torchvision.io.encode_png(image_tensor, compression_level = 1)
t0 = time.time()
for i in range(100):
val = tv_png()
print(f"Bytes in file:{len(val)}")
print(f"Average time: {(time.time() - t0) / 100}")
Results for Torch Vision:
Options | Time (s) | File Size (bytes) |
---|---|---|
compression_level=0 | 0.036 | 2770032 |
compression_level=1 | 0.071 | 1272572 |
compression_level=4 | 0.117 | 1116922 |
compression_level=9 | 1.535 | 1069100 |
My 3rd library that I considered was OpenCV. Here's the code I used to test that option:
path = "/home/aalangfo/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")
image_tensor = F.to_tensor(pil_image) * 255.0
image_tensor = image_tensor.to(torch.uint8)
image_tensor = image_tensor.numpy()
image_tensor = image_tensor.transpose((1, 2, 0))
def cv_png():
result, buffer = cv2.imencode('.png', image_tensor, [cv2.IMWRITE_PNG_COMPRESSION, 0])
return buffer
t0 = time.time()
for i in range(100):
out = cv_png()
print(f"Bytes in file:{len(out.tobytes())}")
print(f"Average time: {(time.time() - t0) / 100}")
Results for OpenCV:
Options | Time (s) | File Size (bytes) |
---|---|---|
cv2.IMWRITE_PNG_COMPRESSION, 0 | 0.035 | 2770047 |
cv2.IMWRITE_PNG_COMPRESSION, 1 | 0.063 | 1272600 |
cv2.IMWRITE_PNG_COMPRESSION, 4 | 0.093 | 1186740 |
cv2.IMWRITE_PNG_COMPRESSION, 9 | 2.088 | 1107430 |
WebP Benchmarks
WebP is an image format developed by Google that allows for both lossless and lossy compression for images. It was designed to offer more efficient compression than JPEG and PNG. It also supports animation like GIF does, but improves on GIFs compression.
The downside of WebP is that it may not be supported in older browsers and devices. There's also a few more levers in the WebP encoding spec, so it may take a bit more time to find the right settings for you. You should explore a list of those options here before reviewing the results.
I will stick with only lossless encoding for WebP for the same reasons mentioned above.
Here's my code for benchmarking WebP with Python Image Library (PIL):
import io
import time
from PIL import Image
import torchvision.transforms.functional as F
path = "/home/user/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")
image_tensor = F.to_tensor(pil_image) * 255.0
image_tensor = image_tensor.to(torch.uint8)
def pil_webp(out):
pil_image: Image = F.to_pil_image(image_tensor)
pil_image.save(out, format="WebP", lossless=True, quality=0, method=0)
t0 = time.time()
for i in range(100):
out = io.BytesIO()
pil_webp(out)
print(f"Bytes in file:{len(out.getvalue())}")
print(f"Average time: {(time.time() - t0) / 100}")
Results for PIL:
Options | Time (s) | File Size (bytes) |
---|---|---|
quality=0, method=0 | 0.047 | 1046120 |
quality=0, method=3 | 0.218 | 814500 |
quality=0, method=6 | 0.270 | 808084 |
quality=50, method=0 | 0.080 | 1046734 |
quality=50, method=3 | 0.324 | 811578 |
quality=50, method=6 | 0.397 | 804762 |
quality=100, method=0 | 0.304 | 1033038 |
quality=100, method=3 | 0.745 | 809758 |
quality=100, method=6 | 7.203 | 791030 |
As of this post, torch vision does not support writing images using WebP.
While OpenCV does support WebP, the only way to do lossless image encoding is to set the quality level higher than 100; For the curious, I was able to get the following results at quality level 101:
- Average time: 0.462
- Bytes in file: 811340
Conclusion
Generally OpenCV is the fastest way to encode images. Because of lack of support for WebP encoding flags in OpenCV, OpenCV with PNG at compression level 0 or 1 seems like a great way to go.
If number of bytes are really important for the use case, WebP provides superior lossless compression to PNG, saving 100s of KB or 1s of MB depending on the configuration.
Top comments (0)