Scroll to bottom to view output images¶
Written by Quentin Adolphe (qadolphe1@swarthmore.edu) and Cole Smith for our Swarthmore College Computer Vision course in 2023.
Taught by Visiting Assistant Professor of Engineering at Swarthmore College, Stephen Philips. Coursework adapted from Matt Zucker, associate professor in the Engineering department at Swarthmore College.
All code in "Your Code" section completed by Quentin Adolphe and Cole Smith. Project description written by Matt Zucker and Stephen Philips
In this project, we will investigate two applications: Laplacian pyramid blending and hybrid images. You can use the former to make smooth transitions between arbitrary images, such as the apple/orange blend depicted here:
The hallmark of the Laplacian pyramid blend is that low-frequency features (like the constant green or orange hues of the fruits) are blended over larger distances than high-frequency features (such as the tiny dots on the apple or the dimples on the orange).
Hybrid images are interesting optical illusions described in the section below.
Knowing how image features are distributed across the frequency spectrum is critical for understanding both of these applications.
Similar to the gradient, the Laplacian operator is a derivative of a scalar function such as a grayscale image. Whereas the gradient is a first derivative that maps an image to a vector at each point, the Laplacian is a second derivative that maps an image to a scalar at each point.
The definition of the Laplacian of a function $f: \mathbb{R}^2 \mapsto \mathbb{R}$ is given by $$ \nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2}. $$ That is, the Laplacian is the sum of second partial derivatives of the image.
Just as the partial derivative of a blurred image can be computed by filtering with the derivative of a Gaussian, the Laplacian of a blurred image can be computed by filtering with the Laplacian of a Gaussian (LoG) filter, defined as the Laplacian of the Gaussian kernel.
Here is the LoG kernel in 1D and 2D:
As it turns out, the LoG filter can be well-approximated by a difference of Gaussians (DoG). On the left below are two Gaussians $g_1$ and $g_2$ whose widths are given by $\sigma_1$ and $\sigma_2 = 2 \sigma_1$; on the right is a similarly-scaled LoG filter.
As you can see, the difference $g_2 - g_1$ closely approximates the LoG kernel (and the same holds for 2D Gaussians/LoG kernels as well).
Hence, by linearity of filtering, taking the difference of two blurs of the same image is approximately the same as filtering with the LoG: [ (I g_2) - (I g_1) = I (g_2 - g_1) \approx I LoG. ] The equation above forms the basis of the so-called Laplacian pyramid (see section 3.5.3 of the Szeliski textbook), which encodes an image as a succession of progressively smaller Laplacian-filtered images. The coarsest layer of the image consists of a blurred and reduced copy of the original image.
An alpha mask is a special type of single-channel image where every pixel represents a coefficient for a weighted average of two RGB input images.
In the grid below, the top row represents two input RGB images. The bottom-left image is a continous mask where 0 corresponds to pixels from the left input, 1 corresponds to pixels from the right image, and any value in between corresponds to some combination of the respective pixels from each image. The bottom-right image is the alpha blend result.
Assuming a floating-point mask, code for an alpha blend might look something like this:
for y in range(height):
for x in range(width):
for channel in range(3):
result[y,x,c] = img1[y,x,c] * (1.0 - mask[y,x]) + img2[y,x,c] * mask[y,x]
This algorithm is implemented much more efficiently as alpha_blend
(defined in the "Utility functions" section).
The main advantage of alpha blending over strictly boolean masks is that you can achieve smoother transitions between regions by combining image pixels continuously.
See https://en.wikipedia.org/wiki/Alpha_compositing or Szeliski section 3.1.3 for more details.
The Laplacian pyramid is a useful data structure for understanding
images at multiple scales. Here is a Laplacian pyramid of a cat
(source image at blep.jpg
):
As you can see, the images are decreasing in size, and all but the smallest/coarsest are derivative images (i.e. their intensity values can be either positive or negative, with middle gray representing zero intensity).
The image below is an example of an interesting optical illusion (shown in two sizes).
When viewed close up, it appears to be NASA mathematician Katherine Johnson. However if you view it from far away (or use your browser's "zoom out" functionality to shrink it), it begins to resemble actor Taraji P. Henson, who portrayed Johnson in the film Hidden Figures.
Such a hybrid image can be obtained from two source images $A$ and $B$ by following these steps:
Obtain $A_{lopass} = g(A, \sigma_A)$ by blurring $A$ with a Gaussian kernel with width $\sigma_A$.
Obtain $B_{hipass} = B - g(B, \sigma_B)$ by blurring $B$ with a Gaussian kernel of width $\sigma_B$, and subtracting the result from $B$.
The resulting image is obtained as $I = A_{lopass} + k \, B_{hipass}$.
One important caveat is that hybrid images only work well if the two input images $A$ and $B$ are well-aligned in the first place! You simply won't get a good result if you try to make a hybrid image of a skyscraper and a hot air balloon. I've also found that black-and-white images tend to work better than color but both can work fine if you choose your inputs carefully.
Our code used for the project is attached below with a process explanation in the Write Up:
import os
import sys
from collections import namedtuple
import json
import cv2
import numpy as np
import matplotlib.pyplot as plt
MAX_DISPLAY_W = 1200
MAX_DISPLAY_H = 700
FIRST_IMSHOW = True
############################################################################################
# Image manipulation functions
######################################################################
def draw_image_with_mask(image, mask):
"""Return a copy of image with the mask overlaid for display."""
assert image.shape[:2] == mask.shape
return alpha_blend(image // 2, image // 2 + 128, mask)
######################################################################
def alpha_blend(img1, img2, mask):
"""Perform alpha blend of img1 and img2 using mask.
Result is an image of same shape as img1 and img2. Wherever mask
is 0, result pixel is same as img1. Wherever mask is 255 (or 1.0
for float mask), result pixel is same as img2. For values in between,
mask acts as a weight for a weighted average of img1 and img2.
See https://en.wikipedia.org/wiki/Alpha_compositing
"""
(h, w) = img1.shape[:2]
assert img2.shape == img1.shape
assert mask.shape == img1.shape or mask.shape == (h, w)
result = np.empty_like(img1)
if mask.dtype == np.uint8:
mask = mask.astype(np.float32) / 255.0
if len(mask.shape) == 2 and len(img1.shape) == 3:
mask = mask[:, :, None]
result[:] = img1 * (1 - mask) + img2 * mask
return result
############################################################################################
# Functions for creating ROIs
######################################################################
def ellipse_mask_from_roi(src_image, src_roi, wh_scales=(1.0, 1.0), flip=False):
src = src_roi
wsz, hsz = wh_scales
h, w = src_image.shape[:2]
src_size = (w, h)
ellipse_mask = roi_draw_ellipse(src, wsz, hsz, src_size)
return ellipse_mask
######################################################################
def roi_from_points(top_left, top_right, bottom):
"""Create an ImageROI struct from three points given by user.
Returns a namedtuple with fields:
* center: center of ROI rectangle as (float, float) tuple
* angle: angle of ROI rectangle in radians
* width: width of ROI rectangle
* height: height of ROI rectangle, also used as
scaling factor for warps
"""
p0 = np.array(top_left, dtype=np.float32)
p1 = np.array(top_right, dtype=np.float32)
p2 = np.array(bottom)
u = p1-p0
width = np.linalg.norm(u)
u /= width
v = p2-p0
if u[0] * v[1] - u[1] * v[0] < 0:
u = -u
top_left, top_right = top_right, top_left
v -= u * np.dot(u, v)
assert np.abs(np.dot(u, v)) < 1e-4
height = np.linalg.norm(v)
cx, cy = p0 + 0.5*u*width + 0.5*v
angle = np.arctan2(u[1], u[0])
return ImageROI((float(cx), float(cy)),
float(angle), float(width), float(height))
############################################################################################
# Region of interest handlers
######################################################################
ImageROI = namedtuple(
'ImageROI',
['center', 'angle', 'width', 'height']
) # Region of Interest container object
######################################################################
def roi_from_center_angle_dims(center, angle, width, height):
"""Simple ROI constructor from center, angle, width, height."""
center = (float(center[0]), float(center[1]))
angle = float(angle)
width = float(width)
height = float(height)
return ImageROI(center, angle, width, height)
######################################################################
def roi_get_matrix(image_roi):
"""Get a 3x3 matrix mapping local object points (x, y) in the ROI to
image points (u, v) according to the formulas:
x' = image_roi.height * x
y' = image_roi.height * y
c = cos(image_roi.angle)
s = sin(image_roi.angle)
u = c * x' - s * y' + image_roi.center[0]
v = s * x' + c * y' + image_roi.center[1]
"""
c = np.cos(image_roi.angle)
s = np.sin(image_roi.angle)
tx, ty = image_roi.center
h = image_roi.height
return np.array([[c*h, -s*h, tx],
[s*h, c*h, ty],
[0, 0, 1]])
######################################################################
def roi_map_points(image_roi, opoints):
"""Map from local object points to image points using the matrix
established by roi_get_matrix(). The opoints parameter should be an
n-by-2 array of (x, y) object points. The return value is an
n-by-2 array of (u, v) pixel locations in the image.
"""
M = roi_get_matrix(image_roi)
opoints = opoints.reshape(-1, 1, 2)
ipoints = cv2.perspectiveTransform(opoints, M)
return ipoints.reshape(-1, 2)
######################################################################
def draw_roi_on_image(image, image_roi, color=(255, 255, 0), thickness=10):
"""Draws ROI box on image, accounting for angle. Takes in optional color and thickness."""
opoints = np.array([
[-0.5, -0.5],
[ 0.5, -0.5],
[ 0.5, 0.5],
[-0.5, 0.5],
[-0.2, 0.0],
[ 0.2, 0.0],
[ 0.0, -0.2],
[ 0.0, 0.2],
[ 0.0, 0.5]
]) * np.array([image_roi.width/image_roi.height, 1])
ipoints = roi_map_points(image_roi, opoints).astype(int)
display = image.copy()
scl = thickness
cv2.polylines(display, [ipoints[:4]], True,
color, scl, cv2.LINE_AA)
for i in [0, 1, -1]:
cv2.circle(display, tuple(ipoints[i]), 4*scl,
color, scl, cv2.LINE_AA)
cv2.line(display, tuple(ipoints[4]), tuple(ipoints[5]),
color, scl, cv2.LINE_AA)
cv2.line(display, tuple(ipoints[6]), tuple(ipoints[7]),
color, scl, cv2.LINE_AA)
return display
######################################################################
def roi_draw_ellipse(img_roi, wsz, hsz, size=None):
"""Draw an ellipse into an 8-bit single-channel mask image centered
on the given ROI and rotated to align with it. The given dimensions
are as fractions of the total height of the original ROI.
"""
w, h = size
mask =