Let's assume you want to resize an image without content distortion but also you wish to preserve the relevant image parts. The normal image resize, but also the content cropping technique is not really suitable for this kind of task, since the first one will simply resize the image by preserving the aspect ratio and the last one will crop the image on the defined coordinate section, which might results in content loss, especially on photos with the relevant information scattered trough the image. Not even the smart cropping technique will help too much in this case.
This is what Caire, my content aware image resizing library developed in Go is trying to remedy. I'is pretty much based on the article Seam Carving for Content-Aware Image Resizing by Shai Avidan and Ariel Shamir.
The background
Let’s consider the image below (Fig.1). It’s a nice and clean picture with a wide open background. Now suppose that we want to make it smaller. We have two options: either to crop it, or to scale it. Cropping is limited since it can only remove pixels from the image periphery. Also advanced cropping features like smart cropping cannot resolve our issue, since it will remove the person from the left margin or will crop a small part from the castle. Certainly we do not want this to happen. Scaling also is not sufficient since it is not aware of the image content and typically can be applied only uniformly.
Fig.1: Sample image
Seam carving was developed typically for this kind of use cases. It works by establishing a number of seams (a connected path of low energy pixels) crossing the image from top to down or from left to right defining the importance of pixels. By successively removing or inserting seams we can reduce or enlarge the size of the image in both directions. Fig.2. Illustrates the process.
Fig.2: The seam carving method illustrated
First let’s skim through the details and summarize the important steps.
- An energy map (edge detection) is generated from the provided image.
- The algorithm tries to find the least important parts of the image taking into account the lowest energy values.
- Using a dynamic programming approach the algorithm will generate individual seams crossing the image from top to down, or from left to right (depending on the horizontal or vertical resizing) and will allocate for each seam a custom value, the least important pixels having the lowest energy cost and the most important ones having the highest cost.
- Traverse the image from the second row to the last row and compute the cumulative minimum energy for all possible connected seams for each entry.
- The minimum energy level is calculated by summing up the current pixel value with the lowest value of the neighboring pixels from the previous row.
- Traverse the image from top to bottom (or from left to right in case of vertical resizing) and compute the minimum energy level. For each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it.
- Find the lowest cost seam from the energy matrix starting from the last row and remove it.
- Repeat the process.
Implementation
Seam carving can support several types of energy functions such as gradient magnitude, entropy, sobel filter, each of them having something in common: it values a pixel by measuring its contrast with its neighboring pixels. We will use the Sobel filter operator in our case. Typically in image processing the Sobel operator is used to detect image edges. It’s working with an energy distribution matrix to differentiate the sensitive image information from the less sensitive ones. (Fig. 3)
Fig.3: Sobel threshold applied to the original image
Once we obtained the energy distribution matrix we can advance to the next step to generate individual seams of one pixel wide. We’ll use a dynamic programming approach to store the results of sub-calculations in order to simplify calculating the more complex ones. For this purpose we define some setter and getter methods whose role are to set and get the pixel energy value.
After generating the energy map there are two important steps we need to follow:
Traverse the image from the second row to the last one and compute the cumulative minimum energy M for all possible connected seams for each entry (i, j). M is the two dimensional array of cumulative energies we are building up. The minimum energy level is calculated by summing up the current pixel value with the minimum pixel value of the neighboring pixels obtained from the previous row. This can be done via Dijkstra's algorithm. Suppose that we have a matrix with the following values:
Original matrix
To compute the minimum cumulative energies of the second row we start with the columns from the first row and sum up with the minimum value of the neighboring cells from the second row. After the above operation is carried out for every pixel in the second row, we go to the third row and so on.
Using Dijkstra’s algorithm to calculate the minimum energy values.
Once we calculated the minimum energy values for each row, we select the last row as starting position and search for the pixel with the smallest cumulative energy value. Then we traverse up on the matrix table one row at once and again search for the minimum cumulative energy value up until the first row. The obtained values (pixels) make up the seam which should be removed.
To remove the seam is only a matter of obtaining the pixel coordinates of the seam values and checking on each iteration if the processed pixel coordinates corresponds with the seams pixel values position. We can check the accuracy of our logic by drawing the removable seams on top of the image. (Fig.4)
Fig.4: Low energy seams crossing the image
The same logic can be applied to enlarge images, only this time we compute the optimal vertical or horizontal seam (s) and duplicate the pixels of s by averaging them with their left and right neighbors (top and bottom in the horizontal case).
Fig.5: The final resized image
Prevent face deformation with face detection
For certain content such as faces when relations between features are important, automatic face detection can be used to identify the areas that needs protection. This is an important requirement since in certain situations when sensible image regions like faces (detected by the sobel filter operator) are compressed in a small area it might happen to get distorted (see Fig.6). To prevent this, once these regions are detected we can increase the pixel intensity to a very high level before running the edge detector. This way we can assure that the sobel detector will consider these regions as important ones, which also means that they will receive high energy values.
Fig.6: Resizing image with and without face detection
Below you can see the seam carving algorithm in action, first checking for human faces prior resizing the image. It's visible from the image that the seams are trying to avoid the face zone normally marked with a rectangle. And this is done with a simple trick: if face zone is detected that rectangle is marked with a white background which makes the detector to believe that it's an area with high energy map. For face detection it was used Pigo, another Go library developed by me.
Conclusion
Seam carving is a technique which can be used on a variety of image manipulations including aspect ratio change, image retargeting, object removal or even content amplification. Also it can be seamlessly integrated with a Convolutional Neural Network which can be trained for specific object recognition, making the perfect toolset for every content delivery solution. There are other numerous possible domains this technique could be applied or even extended like video resizing or the ability for continuous resizing in real time.
Of course as every technology has its limitations, like the case when the processed image is very condensed, in the sense that it does not contain “less” important areas, ugly artifacts might appear. Also the algorithm does not perform very well when the image albeit being not very condensed, the content is laid out in a manner that it does not permit the seams from bypassing some important parts. In certain situations by tweaking the parameters, like using a higher sobel threshold or applying a blur filter these kind of limitations could be surpassed.
Source code
The code is open sourced and can be found on my Github page: