Auto Alignment for Product Photography

Stylized Team

Stylized Team

Cover Image for Auto Alignment for Product Photography

tl;dr

Aligning the bottom of product images is crucial for virtual staging tools like https://stylized.ai. We use a combination of affine and rotation transformations to achieve this alignment. The process involves extracting points of interest from the input image, computing the affine transform matrix using OpenCV's cv2.getAffineTransform() function, and applying the transform to the image using cv2.warpAffine(). To fine-tune the alignment, we also compute the rotation matrix using cv2.getRotationMatrix2D()

Figure 1: The original product image. Figure 1: The original product image. Figure 2: The aligned product image. Figure 2: The aligned product image.

Problem Statement

As part of our work on Stylized, a product photography tool that virtually stages product images in 3D scenes, we encountered a challenge in ensuring that the bottom of the product was flush with the virtual platform.

To solve this problem, we needed a way to automatically align the images so that our users wouldn't have to take perfect input photos.

Solution

In this post, we'll describe the process we followed to extract points of interest from the images, apply affine and rotation transformations to align the bottom of the image, and fine-tune the alignment using rotation. Let’s get started!

Necessary Inputs

Before we can apply the affine and rotation transformations, we need to have three things:The input image: This is simply the original photograph of the product that we want to align.The foreground mask: This is a binary image that segments the foreground (i.e., the product) from the background. To obtain the foreground mask, we applied a simple threshold to the input image, setting all pixels above a certain value to 1 and all others to 0.The points of interest to align: These are the coordinates of the non-zero elements in the foreground mask.Extraction of Points of Interest

To extract the necessary points to align, we wrote a function called closest_to_corners(). This function takes in an array of coordinates and the coordinates of the left and right corners. It then finds the point in coords that is closest to the left corner and the point that is closest to the right corner. These two points represent the points of interest that we want to align. To find the closest points, we used the np.argmin() function along with the Euclidean distance between the points. Finally, the function returns the closest points as a tuple. This process allows us to easily extract the necessary points to align and use them as input to the affine and rotation transformations.

To extract the points of interest from the image, we first applied a binary mask to the image to segment the foreground from the background. We then used the np.argwhere() function to find the coordinates of the non-zero elements in the mask. These coordinates were stored in a 2D Numpy array, which we transposed to obtain the desired shape for the input to the affine transformation.

Transformations

To further fine-tune the alignment, we applied a rotation transformation using OpenCV's cv2.getRotationMatrix2D() function. This function takes in the center of rotation, the angle of rotation, and the scale of the transformation as input and returns the rotation matrix. To calculate the angle of rotation, we used the np.arctan2() function and passed in the elements from the transform matrix that correspond to the rotation.

Here's a modified version of the toy example above that includes the rotation transformation:

import cv2
import numpy as np

# Load an image and apply a binary mask to segment the foreground from the background
img = cv2.imread('image.jpg')
mask = (img > 128).astype(np.uint8)

# Find the coordinates of the non-zero elements in the mask
coords = np.argwhere(mask == 1)
coords = coords.transpose()

# Define left and right corner coordinates
left_corner = (0, 0)
right_corner = (5, 0)

# Extract points of interest using closest_to_corners() function
left_point, right_point = closest_to_corners(coords, left_corner, right_corner)

# Apply affine transform
src = np.array([left_point, right_point, (0, 0)])
dst = np.array([[left_point[0], right_point[1]], right_point, (0, 0)])
transform_matrix = cv2.getAffineTransform(src, dst)
transformed_coords = cv2.transform(coords, transform_matrix)

# Calculate center of rotation
rows, cols = img.shape[:2]
center = (cols//2, rows//2)

# Extract the rotation elements from the transform matrix
a, b = transform_matrix[0, 0], transform_matrix[0, 1]

# Calculate the angle of rotation from the transform matrix
angle = np.rad2deg(np.arctan2(b, a))

# Apply rotation transformation
rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated_coords = cv2.transform(transformed_coords, rotation_matrix)

Summary

Aligning images is a crucial step in product photography and can be done automatically using affine and rotation transformations. By extracting the points of interest and finding the closest points to the corners, we were able to apply the transformations and fine-tune the alignment. This process allowed us to create a seamless virtual staging experience for our users on https://stylized.ai. Visit us if you want to try it out and see the example in action! If you're interested in learning more about image processing techniques, we'll keep updating this page with more.