Tag: machine-learning

Porting Pigo face detection library to Webassembly (WASM)
In the last couple of occasions I wrote about the Pigo face detection library I have developed in Go. This article is another one from the series, but this time I’m focusing on WASM (Webassambly) integration. This is another key milestone in the library evolution, considering that when I started this project the library was capable only for face detection. Later on the library has been extended to support pupils/eyes localization, then facial landmark points detection and it has also been adapted to be integrated into different programming languages as a shared object library (SO). I’m pretty delighted about the great acceptance and support received during the development from the programming community, the library being featured a couple of times in https://golangweekly.com/, received 2.5k stars on the repo Github page (and still counting), getting many positive feedback on Reddit, which means it payed back the efforts.

But first what is WASM? To quote the https://webassembly.org/ homepage:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

In other words this means that compiling and porting a native application to the WASM standard will give the generated web application a speed almost equal to the native one.

Starting from v1.11 an experimental support for WASM has been included into the Go language, which has been extended on v1.12 and v1.13. As I mentioned in the previous articles Go suffers terrible in terms of native and platform agnostic webcam support and as of my knowledge currently there is no single webcam library in the Go ecosystem which is platform independent. This was the reason why, to prove the library real time face detection capabilities, I opted to lean on exporting the main function as a shared object library, but lately this proved to be inefficient in terms of real time performance, since on each frame the Python app had to transfer the pixel array to the Go app (where the face detection operation is happening) and get back the detected faces coordinates. Of course because of the two way (back and forth) communication, this process fall back considerably in terms of pure performance.

WASM to the rescue

As I mentioned above starting from v1.11 the standard Go code base now includes the syscall/js package which targets WASM. However the API has been refactored and gone trough a few iterations to became stable as of v1.13. This also means that the WASM API of v1.13 is no more compatible with the v1.11. The API became so mature that there is no need to use external libraries targeting the Javascript runtime in Go, like Gopherjs.

In order to compile for Webassambly we need to explicitly specify and set the GOOS=js and GOARCH=wasm environment variables on the building process. Running the below command will build the package and produce a .wasm executable Webassambly module file.

$ GOOS=js GOARCH=wasm go build -o lib.wasm wasm.go

This file then can be referenced in the main html file.

<br />

That’s the only thing we need to do in order to have fully functional WASM application. The hardest part is coming afterwards. When we are targeting WASM there are a few takeaways we need to keep in mind in order the implementation to run as smooth as possible:
1. To have access to a Javascript global variable in Go we have to call the js.Global() function.
2. To call a JS object method we have to use the Call() function.
3. To get or set an attribute of a JS object or Html element we can call the Get() or Set() functions.
4. Probably the trickiest part of the Go Wasm port is related to the callback functions, and there are a lot of places where we need to take care of them. One of the most important one is the canvas requestAnimationFrame method, which accepts as a second argument a callback function. Now to invoke this method in Go we need to apply to the js.FuncOf(func(this js.Value, args []js.Value) interface{}{...} function, where the second argument is the callback function argument. A very important note: the function body should be called inside a goroutine, otherwise you’ll get a deadlock runtime exception. You can check the package documentation here: https://godoc.org/syscall/js
The implementation details

Now let’s take a deep breath and roll into the implementation details. One of the key components of the face detection application is the binary cascade file parser. Since in normal circumstances, when we are fetching and parsing the binary files locally, we can rely on the os system package, this is not the case on WASM, because we do not have access to the system environment. This means that we cannot load and parse the files sitting on our local storage. We need to fetch them through the supported Javascript methods like the fetch method. In Javascript this method returns a promise with two arguments: a success and a failure. As I mentioned previously the callback functions needs to be invoked in separate goroutines and the most straightforward way to orchestrate the results when we are dealing with goroutines is to use channels. So the fetch method should return the parsed binary file as a byte array and an error in case of a failure.

<br />

The rest of the implementation does not differ in anything from the Unpack() method presented in the previous article. Once we get binary array there is nothing left over just to unpack the cascade file, transform it to the desired shape and extract the relevant information.

<br />

I won’t go into too much details about the detection algorithm itself, since I’ve discussed it in the previous articles. I will detail what’s most important in the perspective of the WASM integration. Another key importance part in the WASM integration is related to the webcam rendering operation. In Javascript we can access the webcam in the following way:

<br />

We can translate this snippet to Go (WASM) code in the following way:

<br />

This will return a canvas object (struct) over it we can call the rendering method itself. In the background this will trigger the requestAnimationFrame JS method, drawing each webcam frame to a dynamically created image element. From there we can extract the pixel values calling the getImageData HTML5 canvas method. Since this will return an object of type ImageData which values are of Uint8ClampedArray, these needs to be converted to a type supported by the Go WASM API. The only supported uint type in the Go syscall/js API is uint8, so the most suitable way is to convert the Uint8ClampedArray to Uint8Array. We can do this by calling the following method:

<br />

The code below shows the rendering method. It’s a pretty common Javascript code adapted to WASM Go, including the code responsible for the face detection, but this have been discussed in the previous articles. The most important part is the type conversion, without that we are just getting a compiler error.

<br />

And to put all together, in the main function we are just calling the webcam Render() method, once the webcam has been initialized, otherwise an alert telling that the webcam was not detected.

<br />

The end result

Voila, we are done. Here is how the Webassambly version of Pigo looks like in realtime. Pretty good, huh? That’s, all, thanks for reading and if you like my works you can follow me on Twitter or give it a star on the project Github repo: https://github.com/esimov/pigo.

Look at this! Can OpenCV beat this speed? Pigo has been ported to #webassembly ??, which gains him a huge speed improvements in terms of real time performance.

This proves that Pigo is capable running real time, which was not quite obvious until now. #golang #wasm #javascript pic.twitter.com/yeKir9K3YC

— Endre Simo (@simo_endre) November 18, 2019
January 7, 2020
Pupils/eyes localization in the Pigo face detection library
In the previous post I’ve presented a general overview of the Pigo face detection library I’m working on, some examples of how you can run it on various platforms and environment for detecting faces and also its desired scope in the Go ecosystem. Meantime the library got a huge revamp and reached a new milestone in the development live cycle, which means as of today Pigo is capable of pupils/eyes localization and even better supports facial landmark points detection. In this article I will focus on the pupils/eyes detection, following that in the next article aiming to discuss about the facial landmark points detection. So let’s get started.

Pupils/eyes localization

The implementation is based on the Eye pupil localization with an ensemble of randomized trees paper, which pretty much resembles with the method used on face detection but with few remarkable differences. I will explain them shortly. Both of the implementations are based on tree ensembles, well known for achieving impressive results, in contrary to a single tree detection method. This means that the classifier used for the pupil/eyes detection is generated based on decision trees and the training data is recursively clustered in this fashion until some termination condition is met.

Since we know the leaf structure of the binary classifier the only thing we need to do is to decompose it in order to obtain the data structure. The decomposition procedure is based on the following steps:
- Read the depth (size) of each tree and write it into the buffer array.
- Get the number of stages as 32-bit unsigned integers and write it into the buffer array.
- Obtain the scale multiplier (applied after each stage) and write it into the buffer array.
- Obtain the number of trees per stage and write it into the buffer array.
- Obtain the depth of each tree and write it into the buffer array.
- Traverse all the stages of the binary tree, traverse all the branches of each stage and read prediction from tree’s leaf nodes.
The returned element should be a struct containing all of the above information.

There is a problem though: the output of the regression trees might be noisy and unreliable in some special cases like when we are feeding some low quality video frames or images. This is the reason why the method introduces a random perturbation factor during runtime to outweigh the false positive rates on detection. Later on, after the detected face region is classified we sort the perturbations in ascendant order and retrieve the median value of the resulted vector.

The classification procedure consists in the following: traverse the trees on all the stages and use almost the same binary test applied on the face classification procedure. We have to restrict the pupils/eyes classification method exclusive to the detected face region. In the end we will obtain the pupils coordinates and a scale factor. The scale factor is less important in this case, since it’s useful only if we wish to give a different size for the detected pupils.

<br />

We are using the same method for left and right eye detection, the only difference is that for the right eye detection we flip the sign in the following formula.

<br />

To run it on static images a CLI application is bundled into the library. The difference from the face detection is that in addition we have to provide the binary classifier for pupil/eyes detection. Since the library is constructed in a modular fashion, this means that only certain sections needs to be extended like the binary parser and the classification function. Other parts of the code remains pretty much unchanged.

This is how the end result looks like:

Webcam demo

I have also created a webcam demo running in Python, since as of today there isn’t a native Go webcam library supported on all platforms. The major takeaway is that the computational (the face and pupils detection) part is executed in Go and the resulted data (the detection coordinates) are transferred to Python through a shared object library.

It sounds like an alchemy, but it’s definitely a working solution. On the Go part there are a few things we need to take care.
- The exported function should be annotated with the //export statement.
- The source must import the pseudo C package.
- An empty main function should be declared.
- The package must be a main package.
These are only the technical requirements imposed by the language in order to transfer the Go code as a shared object. Afterwards we can build the program with the following commands (in fact these will be executed from the Python code).

<br />

To communicate with the Go code from the Python code base we will use the Ctype library. This is a handy library which provides C compatible data types, and allows calling functions in DLLs or shared libraries. Using the library at first might be intimidating since it heavily relies on the C structs and types. Also accessing the Go data is possible exclusively through pointer interfaces. This means we need to be aware about the data type, the buffer length and the maximum capacity of the transferred Go slice. In fact the slice will be a pointer pointing to a data structure. In Python we have to declare a class which maps to a C type struct having the following components:

<br />

On the other hand at the Go counterpart after running the detection function and retrieving the coordinate results we need to convert them as one dimensional array, since in Go is not possible transfer a 2D array over an array pointer. The trick is to convert the 2D array (since we are dealing with multiple face detection and by default multiple pupils/eyes detection) to a 1D array in such a way to delimit the detection groups from each other. One efficient way to make this happen is to introduce as a first slice element the number of detected faces. Since we know how many faces are detected, later on in the Python part we can transform the array with the help of numpy to the desired shape.

This is how the Go code looks like. We are invoking the runtime.KeepAlive(coords) function to make sure the coords is not freed up by the garbage collector prematurely.

<br />

In the end we should obtain something like below, where the first value represent the number of the detected faces and the rest of the values are the x and y coordinates and the scale factor of the detected faces, respectively pupils.

[2 0 0 0 0 272 297 213 41 1 248 258 27 41 0 248 341 27 41 0 238 599 72 25 1 230 587 9 25 0 233 616 9 25 0]

Back to Python. We are capturing each frame from the webcam and transferring the underlying binary data to Go as a byte array. After running the face detector over the obtained binary data we are retrieving back the obtained one dimensional array. Ctypes requires to define a class which must contains a _field_ attribute. This attribute must be a list of 2-tuples, containing a field name and a field type. Since the Go slice has a len and a cap attribute we will define them as the tuple elements. Once we’ve defined the base class we also have to define the argument types. The argument type will be the base class and the return type will be a pointer type, represented as integer exactly as the Go return type.

<br />

Since in Ctypes we are dealing with pointer values, this means we do not have any information about the allocated buffer size needed by the face detector, this is the reason why we are allocating a buffer just enough to hold all the values returned by the detector. Later on we’ll trim the buffer length by emptying the null values. We can do this because we already know the number of detected faces. So the end results will be the number of detected faces multiplied by 3, since each face detection result should be extended with a pair of pupil detection.

<br />

And that’s all. Below is a video capture of running the face detector in real time. The small lagging is due to time needed to convert the Go code to a shared library, otherwise the algorithm can produce much higher frame rates.

Pigo, the Go face detection library now supports pupil/eyes localization.? There are still more to come! #golang #MachineLearning https://t.co/JTEZp9fdem pic.twitter.com/eNHydt2ZCL

— Endre Simo (@simo_endre) August 9, 2019

I hope that you enjoyed reading this article and got a better understanding of the interaction between Go and Python. In the next article I’ll present a concrete example using this feature. If you appreciate the work done on this library please show your support by staring the library Github repository here: https://github.com/esimov/pigo.
November 1, 2019
Technical overview of Pigo face detection library
Pigo’s cute logo

Pigo is a pure face detection library implemented in and for Go and it is based on Pixel Intensity Comparison-based Object detection paper (https://arxiv.org/pdf/1305.4537.pdf). Pigo has been around for more than a year, but I haven’t published a technical article about the library, so I thought it was time to fill in the gap.

Technical overview

The first and foremost question: what was the motivation and the purpose of making this library, since GoCV does exists from a quite a long time in the Go ecosystem and is trying to be the perfect toolset for everyone who is willing to combine the simplicity of the Go language with the comprehensiveness of OpenCV in everything related to computer vision, machine learning and anything between?
Making a quick search on the web for face detection with Go, almost 100% of the returned results are C bindings to external libraries. Pigo is making an exception from this, since it is purely developed in Go, having in mind the casual Go developer. The reference implementation is written in C and is available here: https://github.com/nenadmarkus/pico. There is no need to install platform dependent libraries, no need to compile and build the giant, monolithic OpenCV package.

What are the benefits of using Pigo? Just to name a few of them:
- High processing speed
- No need for image preprocessing prior detection
- No need for the computation of integral images, image pyramid, HOG pyramid or any other similar data structure
- The face detection is based on pixel intensity comparison encoded in the binary file tree structure
- Fast detection of in-plane rotated faces
New let’s see what are the similarities and what are the differences between Pigo and other detection methods. Similar to the Viola Jones face detection algorithm, Pigo is still using cascade decision trees at all reasonable scales and positions, but the cascade classifier is in binary format. The role of a classifier is to tell if a face is present in current region or not. The classifier consists of a decision tree where the leaves or internal nodes contains the results of pixel intensity comparison test in binary format. The binary test can be expressed with the following formula:

\( bintest(R, \text{$x1$, $y1$, $x2$, $y2$}) =
\begin{cases}
0, & \text{$R$[$x1$, $y1$]} \le \text{$R$[$x2$, $y2$]} \\
1, & \text{otherwise},
\end{cases}
\)

where \(R\) is an image region and (\(x_i\), \(y_i\)) represents the normalized pixel location coordinates, which means that the binary test can be easily resized if needed.

Since the cascades are encoded into a binary tree, this means that in order to run the detection method the cascade classifier needs to be unpacked. In the end the unpacking method will return a struct which has the following elements:

<br />

Due to many regions present in the image, the decision trees are organized in cascading classifiers, making each member of cascade to be evaluated in \( O(1) \) time with respect to the size of the region. An image region is considered being face if it passes all the cascade members. Since this process is limited to a relatively small number of regions, this gains high computation speed.

<br />

The above method will classify the cascade trees based on the trained data, but since many of them are in-plane faces, this means that rotated faces are not detectable. For this reason we need to introduce another method, which classifies the regions based on the rotation angle provided as input. (I’m not including the code snippet for this one, since you can check it on the source code.)

Another key aspect of the algorithm is that during the decision tree scanning, each detection is flagged with a detection score. An image region is considered being face if the detection score average is above a certain threshold (in general around 0.995). All regions below this threshold are not considered to be faces. We can achieve different ratios of true positives to false positives by varying the threshold value. Since the detector is based on pixel intensity comparison, this means that it’s also sensible to small perturbations and variations of the underlying pixel data, influenced mostly by the scaling and the rotation of the objects, resulting overlaps in detections.

The detection results without the clustering applied

This is the reason why the cascade detections are clustered together in a post processing step and the final result is produced by applying the intersection over union formula on the detected clusters.

<br />

In return this does not provide the rectangle coordinates of the detected faces as expected, but a slice of Detection struct consisting of number of rows and columns, the scale factor and the detection score. To translate from these to the standard image.Rectangle is only a matter of a simple conversion realized as in the following lines:

<br />

And this is how the final output looks like after the detection results have been clustered.

The detection results with the clustering applied

Pigo and GoCV comparison

The best way to compare the two libraries is to measure them both in terms of pure performance, detection speed but also accuracy. The most obvious way to evaluate them is to run some benchmarks with the same prerequisites. One thing to note is that the classifier is set to detect faces on the same image over and over again. This is to get an accurate idea of how well the algorithm performs.

<br />

To have a more accurate benchmark and to measure only the execution time the b.ResetTimer() is called to reset the timer so the setup is not counted towards the benchmark.

Here are the results:

<br />

For the above test we were using a sample image with a Nasa crew of 17 persons. Both of the libraries have returned exactly the same results, but Pigo was faster and also the memory allocation was way less compared to GoCV.

Use cases

To demonstrate the real time capabilities of the Pigo face detection library and also to show how easily could be integrated into let’s say a Python project I have created a couple of use cases, targeting different kinds of applications. The first one is the usual face detection example: real time, webcam based face detection, marking the detected faces with a rectangle (or circle).

Using Pigo as a shared object library

Go running in Python? Yes, that’s true: we are harvesting the Go feature to export binary files as shared objects. The Go compiler is capable of creating C-style dynamic shared libraries using build flag -buildmode=c-shared. Running the go program with the following flag will generate the shared object library which later on can be imported into the Python program. But the following conditions must be satisfied:
- The exported function should be annotated with the //export statement.
- The source must import the pseudo C package.
- An empty main function should be declared.
- The package must be a main package.
Knowing the above conditions we can execute a shell command from the Python program which in the end will call the shared object library returning the detection results. Later on we can operate with the values obtained. Below is an example of the process just described, whose source code and other examples can be found on the project examples directory: https://github.com/esimov/pigo/tree/master/examples.

<br />

Another handy application I have created is to blur out the detected faces. For static images we can lean on stackblur-go library to blur out the detected faces, otherwise if we are using as a shared library, which is getting imported into a Python program, then we can make use of OpenCV blur method.

Some consistent improvements of Pigo real time face detection capabilities paired with a face masking use case example. #golang #python #MachineLearning https://t.co/eRahkqjfao pic.twitter.com/CgDdNpLpdv

— Endre Simo (@simo_endre) March 17, 2019

I have also successfully integrated Pigo into Caire, the content aware image resizing library I have created. Thanks to Pigo it was possible to avoid the face distortions by restricting the seam removing algorithm only to the picture zones without faces. Once we’ve detected the faces it was only a piece of cake to exclude them from the processing. For more information about Caire read my other blog post, where I’ve detailed in large how the seam carving algorithm is working.

Serverless integration

Pigo has been successfully integrated into the OpenFaaS platform, easing you out from all the tedious work requested by the installation, configuration of the Go language etc., making possible to expose it as an FaaS function. This means once you have an OpenFaaS platform installed on your local machine with only two commands you have a fully working docker image where Pigo can be used as a serverless function.

The OpenFaaS user interface

The faceblur function applied over an input image

Below are some of the OpenFaaS functions i have created integrating Pigo:

https://github.com/esimov/pigo-openfaas-faceblur

https://github.com/esimov/pigo-openfaas

The docker image is also available on the docker hub, having already more then 10k+ downloads. https://cloud.docker.com/u/esimov/repository/docker/esimov/pigo-openfaas

Summary

In conclusion as you might realize Pigo is a lightweight, but full fledged face detection library, easy to use and easy to integrate into different platforms and environments, having a simple API, but well enough for making the job done.

For more computer vision, image processing and creative programming stuffs you can follow me on twitter: https://twitter.com/simo_endre
July 26, 2019
Caire – a content aware image resize library
Let’s assume you want to resize an image without content distortion but also you wish to preserve the relevant image parts. The normal image resize, but also the content cropping technique is not really suitable for this kind of task, since the first one will simply resize the image by preserving the aspect ratio and the last one will crop the image on the defined coordinate section, which might results in content loss, especially on photos with the relevant information scattered trough the image. Not even the smart cropping technique will help too much in this case.

This is what Caire, my content aware image resizing library developed in Go is trying to remedy. I’is pretty much based on the article Seam Carving for Content-Aware Image Resizing by Shai Avidan and Ariel Shamir.

The background

Let’s consider the image below (Fig.1). It’s a nice and clean picture with a wide open background. Now suppose that we want to make it smaller. We have two options: either to crop it, or to scale it. Cropping is limited since it can only remove pixels from the image periphery. Also advanced cropping features like smart cropping cannot resolve our issue, since it will remove the person from the left margin or will crop a small part from the castle. Certainly we do not want this to happen. Scaling also is not sufficient since it is not aware of the image content and typically can be applied only uniformly.

Fig.1: Sample image

Seam carving was developed typically for this kind of use cases. It works by establishing a number of seams (a connected path of low energy pixels) crossing the image from top to down or from left to right defining the importance of pixels. By successively removing or inserting seams we can reduce or enlarge the size of the image in both directions. Fig.2. Illustrates the process.

Fig.2: The seam carving method illustrated

First let’s skim through the details and summarize the important steps.
- An energy map (edge detection) is generated from the provided image.
- The algorithm tries to find the least important parts of the image taking into account the lowest energy values.
- Using a dynamic programming approach the algorithm will generate individual seams crossing the image from top to down, or from left to right (depending on the horizontal or vertical resizing) and will allocate for each seam a custom value, the least important pixels having the lowest energy cost and the most important ones having the highest cost.
- Traverse the image from the second row to the last row and compute the cumulative minimum energy for all possible connected seams for each entry.
- The minimum energy level is calculated by summing up the current pixel value with the lowest value of the neighboring pixels from the previous row.
- Traverse the image from top to bottom (or from left to right in case of vertical resizing) and compute the minimum energy level. For each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it.
- Find the lowest cost seam from the energy matrix starting from the last row and remove it.
- Repeat the process.
Implementation

Seam carving can support several types of energy functions such as gradient magnitude, entropy, sobel filter, each of them having something in common: it values a pixel by measuring its contrast with its neighboring pixels. We will use the Sobel filter operator in our case. Typically in image processing the Sobel operator is used to detect image edges. It’s working with an energy distribution matrix to differentiate the sensitive image information from the less sensitive ones. (Fig. 3)

Fig.3: Sobel threshold applied to the original image

Once we obtained the energy distribution matrix we can advance to the next step to generate individual seams of one pixel wide. We’ll use a dynamic programming approach to store the results of sub-calculations in order to simplify calculating the more complex ones. For this purpose we define some setter and getter methods whose role are to set and get the pixel energy value.

<br />

After generating the energy map there are two important steps we need to follow:

Traverse the image from the second row to the last one and compute the cumulative minimum energy M for all possible connected seams for each entry (i, j). M is the two dimensional array of cumulative energies we are building up. The minimum energy level is calculated by summing up the current pixel value with the minimum pixel value of the neighboring pixels obtained from the previous row. This can be done via Dijkstra’s algorithm. Suppose that we have a matrix with the following values:

Original matrix

To compute the minimum cumulative energies of the second row we start with the columns from the first row and sum up with the minimum value of the neighboring cells from the second row. After the above operation is carried out for every pixel in the second row, we go to the third row and so on.

Using Dijkstra’s algorithm to calculate the minimum energy values.

<br />

Once we calculated the minimum energy values for each row, we select the last row as starting position and search for the pixel with the smallest cumulative energy value. Then we traverse up on the matrix table one row at once and again search for the minimum cumulative energy value up until the first row. The obtained values (pixels) make up the seam which should be removed.

<br />

To remove the seam is only a matter of obtaining the pixel coordinates of the seam values and checking on each iteration if the processed pixel coordinates corresponds with the seams pixel values position. We can check the accuracy of our logic by drawing the removable seams on top of the image. (Fig.4)

Fig.4: Low energy seams crossing the image

<br />

The same logic can be applied to enlarge images, only this time we compute the optimal vertical or horizontal seam (s) and duplicate the pixels of s by averaging them with their left and right neighbors (top and bottom in the horizontal case).

Fig.5: The final resized image

Prevent face deformation with face detection

For certain content such as faces when relations between features are important, automatic face detection can be used to identify the areas that needs protection. This is an important requirement since in certain situations when sensible image regions like faces (detected by the sobel filter operator) are compressed in a small area it might happen to get distorted (see Fig.6). To prevent this, once these regions are detected we can increase the pixel intensity to a very high level before running the edge detector. This way we can assure that the sobel detector will consider these regions as important ones, which also means that they will receive high energy values.

Fig.6: Resizing image with and without face detection

Below you can see the seam carving algorithm in action, first checking for human faces prior resizing the image. It’s visible from the image that the seams are trying to avoid the face zone normally marked with a rectangle. And this is done with a simple trick: if face zone is detected that rectangle is marked with a white background which makes the detector to believe that it’s an area with high energy map. For face detection it was used Pigo, another Go library developed by me.

Conclusion

Seam carving is a technique which can be used on a variety of image manipulations including aspect ratio change, image retargeting, object removal or even content amplification. Also it can be seamlessly integrated with a Convolutional Neural Network which can be trained for specific object recognition, making the perfect toolset for every content delivery solution. There are other numerous possible domains this technique could be applied or even extended like video resizing or the ability for continuous resizing in real time.

Of course as every technology has its limitations, like the case when the processed image is very condensed, in the sense that it does not contain “less” important areas, ugly artifacts might appear. Also the algorithm does not perform very well when the image albeit being not very condensed, the content is laid out in a manner that it does not permit the seams from bypassing some important parts. In certain situations by tweaking the parameters, like using a higher sobel threshold or applying a blur filter these kind of limitations could be surpassed.

Source code

The code is open sourced and can be found on my Github page:

https://github.com/esimov/caire

https://github.com/esimov/pigo
May 2, 2019