Crop Bounding Box + Rotate

#tutorial #python #computervision

Abstract

We are pre-processing photos to ultimately enable OCR. We have bounding boxes for rear-end of cars; now let’s rectangularize the license plates with cropped and rotated photos.

ToC (Step-by-Step)

Overview: OpenCV in Python for End-to-end License Plate Detection.
Camera and computer setup. Raspberry pi (RPi), Canon DSLR, f-stop and ISO.
Capturing images from DSLR to RPi. Automating capture of images and transfer to RPi.
Model for detecting cars. Train model from scratch using YOLO and labeled images.
Crop bounding box + rotate.
TBD -- Model for finding license plates. Train a second model with YOLO.
TBD -- Crop bounding box.
TBD -- Read license plate with OCR. Pre-process image, and extract text with Paddle OCR.

Recipe

We now have model identifying cars in our streamed images from camera. Our high-level goal is cropped-image reduced only to license plate for OCR reading. That means finding cars in image (article 3), cropping image to just rear bumper of car (this article; 4), so that we can find license plate (next article; 5), and crop image to just license plate (article 6).

This interior step is a processing step to ultimately enable OCR. Note particularly in my captured images that images are both looking down, but also from the distant side, from the cars. That eschew is important feature to reduce. OCR expects a straight-on view, thus part of this step is also corrected eschew (by rotation).

At this point we have bounding box per car. We potentially have multiple cars per image - so within this step is a loop that forks all the following steps (for each car: detect license plate, crop, OCR, repeat).

First, we are cropping image to just bounding box from model in step 3, and rotating to match camera’s eschewed perspective in relation to highway.

For each bounding box (ie for each car)

Input image. Run model, find bounding boxes (cars).
Crop to rear bumper, and rotate to correct eschew

Output images cropped down to just rear bumper with clearer view of license plate. Note rotation artifacts in corners. Now, license is straight-on for easier OCR.

from ultralytics import YOLO
import cv2 
import imutils
import numpy as np
from os import path

baseDir = '/Users/japollock/Projects/TrainHighwayCarDetector/'
inputFilePath = baseDir + 'photos/yolo_cars/all/IMG_4553.jpg'
inputFileName = path.basename(inputFilePath)
outputPhotosDir = baseDir + 'photos/yolo_cars/licensePlates/'
model = YOLO(baseDir + 'src/runs/detect/yolov8m_v21_100e/weights/best.pt')

imageOriginal = cv2.imread(inputFilePath)
imageScaled = imutils.resize(imageOriginal, width=1280)

imgRatio = imageOriginal.shape[1] / imageScaled.shape[1]

results = model.predict(source=imageScaled, imgsz=1280)

j=0
for box in results[0].boxes:
    j += 1
    # bounding box in scaled-down image
    x1Float = box.xyxy[0][0].item()
    x2Float = box.xyxy[0][2].item()
    y1Float = box.xyxy[0][1].item()
    y2Float = box.xyxy[0][3].item()

    # calc bounding box in original (scaled-up) image
    x1 = int(imgRatio * x1Float)
    y1 = int(imgRatio * y1Float)
    x2 = int(imgRatio * x2Float)
    y2 = int(imgRatio * y2Float)

    # cropped
    imageCropped = imageOriginal[y1:y2,x1:x2]

    # rotated
    image_center = tuple(np.array(imageCropped.shape[1::-1]) / 2)
    rot_mat = cv2.getRotationMatrix2D(image_center, 4.7, 1.0)
    imageRotated = cv2.warpAffine(imageCropped, rot_mat, imageCropped.shape[1::-1], flags=cv2.INTER_LINEAR)

    outputFilePath = outputPhotosDir + inputFileName[0:(len(inputFileName)-4)] + '_' + format(j,'04d') + '.jpg'
    cv2.imwrite(outputFilePath, imageRotated)
    print('Wrote ' + outputFilePath)

Now we have image clearly showing license plate. We still need to enhance-enhance before handing off to OCR. Thus repeat as before: find the license plate in the now-cropped image, in order to subsequently crop to just license plate. Next: YOLO model for license plates!

Next Link: TBD -- Model for detecting license plates in cropped image

Appendix: Interesting Points

Rotation (2d), and not perspective warping (3d) was an evolution of process. Initially I thought I would use perspective warping via homography as I knew how. But long story short, I realized it was over-engineering after tinkering on the solution. Why go through the effort of calculating perspective warp from my camera to highway, when a simple rotation of the image would work just as well - and a lot more simple?

DEV Community

Crop Bounding Box + Rotate

Abstract

ToC (Step-by-Step)

Recipe

Appendix: Interesting Points

Top comments (0)

Read next

Building a Homegrown LLM with Python: Training on Hacker News Data

Why Developers Should Batch Tasks for Maximum Efficiency

25+ Little-Known Python Resources That Will Make You a Pro!

COBOL Fundamentals: Data Types & Variables