Blog

2D cloth simulation in Go using the Gio GUI library
As the title suggests this article is about creating a desktop application for 2D cloth physics simulation using the Gio GUI library developed in Go. To get a taste in advance about what we are trying to build, below is a Gif image about the result.

Now, to understand the mechanism behind this kind of computer simulation, on the one end we should have some basics understandings about mathematical concepts such Euler equation, Newton’s law of motion and Verlet integration, but on the other end we should be able also to translate these theorems into computer programs.

This article is divided into the following sections:
- The basics of physics simulation
- Euler equation
- Verlet integration
- The cloth simulation
- A basic overview of the Gio GUI framework
- Translating the mathematical theorems into Go codes
The basics of physics simulation

As the name of this article suggests we will apply physics notions like force, mass, gravity, acceleration, speed, and velocity (just to name a few) specifically for cloth physics simulation, but the theorems which stands behind can be used on other objects also, and they can be the basis of a more advanced topic like rigid body simulation. Along the way we will explore how some basic concepts like constraints and joints, relative to the cloth simulation, are also common notions for 2d physics engines also.

Numerical integration

The core algorithm of the simulation is based on two concepts, known in physics as the Newton law’s of motion and restoration of force. These are expressed using continuous time steps, but in the computer simulation we need to use discrete time steps, which means that somehow, we should predict the new position of an object at each time step.
The foundation of every cloth simulation is the so-called numerical integration. There are many different techniques and implementation for physics simulation but, most of them use one of the three widespread integration method: Euler integration, Verlet integration and Runge-Kutta. The most significant difference between these methods consist in the accuracy of the movement.

The Euler integration

The Euler integration is best used on rigid body simulation because of its simplicity and performance, since it assumes that the shape of an objects never change or deform and it also gives a reasonable good approximation. But the problem is that it’s not accurate (unless we use an extremely small timestep (which we have no control over).

To calculate the movement of an object over time, we can use Euler integration in the following manner. Since each object has a mass and a force, by applying Newton’s law of motion results that: acceleration = force / mass. Now that we know the acceleration, we can update the object velocity and the position as follow:

<br />

This looks good, but the problem comes when accuracy comes into discussion. As we mentioned earlier, in computer simulation we should use discrete time steps, times which varies across CPU cycles. In our case deltaTime should never be a constant. This means that the result of such a numerical integration would never be precise and we cannot predict accurately the velocity and the movement of an object. Partially we could overcome this limitation by using an infinitesimal small time step, but another problem arises: the error is cumulative, which means the more we run the simulation the less accurate the movement will be. So, let’s jump into the Verlet integration.

Verlet integration

Verlet integration is very similar to Euler integration only this time we’ll get rid of the velocity component. We only need to calculate the change in position of an object (in our simulation we are using particles) by applying the Newton’s second law of motion: (F = mA), or force is equal to mass times acceleration. From this equation results that acceleration is equal to force over mass, or (A = F/m) which is a scalar component (the position and velocity are vectors). So, knowing the previous position and the acceleration, we can calculate the current position at the next frame.

The Verlet integration formula is the following:

x(t+?t)=2x(t)?x(t??t)+a(t)?t2

We can translate this into the following code:

<br />

where the velocity component, compared to the Euler integration, is not calculated by the acceleration, but it’s a constant. Pretty much that’s all the Euler integration is about. Now, the fun part comes when we need to tweak the values like gravitation, dragging force, elasticity, friction, etc. We’ll come back to this later.

Now let’s move on and discuss about another key component of our system: constraints.

Constraints

As I’ve mentioned earlier, the entities of our system are the particles. They are entities acting individually without having any constraints between each other’s. To have a complex and homogeneous system, where the components are acting together, we have to apply some kind of rules or constraints, which sticks together these individual components. In some simulations these are called sticks or springs, because they connect the particles to each other.
In Go there are no classes, so we should store the constraint components in a struct.

<br />

A particle obviously, above its current position, should also store its new position and the velocity in x and y dimension. These are mandatory, but along the way it can extended with other parameters like friction, elasticity and dragging force, or even more with information like if it’s pinned or active. The later one is important on a tearable cloth simulation. We get at that point shortly.

Now is coming the fun and most challenging part: simulating the cloth physics using the Verlet integration presented earlier.

The cloth simulation

This is the phase when we should put together all the individual pieces presented earlier. We will start by presenting the basic entity of our system: the cloth. As you might guess the cloth is constructed by the constraints and the particles they are acting upon. So, let’s start building up the cloth. As was the case with particles and constraints, we should store the cloth related information into a struct.

<br />

As you can see the Cloth struct stores the information related to the particles and constraints also. Now let’s initialize the cloth.

<br />

All particles are connected to each other except the ones from the first row and column. Why? Because the particles are joined with sticks from left to right and from top to down. The spacing component defines the distance between two particles, which is calculated based on the window width and height. The pinX variable tells if a particle should be pinned up or not. If this is true, then the particle will won’t fall down.

pinX := x % (clothX / 7) means that the cloth should have 7 pinned up points in the first row along the x coordinate. We store the particles and constraints in two separate slices. Now let’s see how the update function looks like. I think this is the proper time to introduce the Gio event system.

Introducing the Gio GUI library

Gio is cross platform event driven, immediate mode GUI library which supports all the major platforms (even Webassembly). You can check the project page at https://gioui.org/. Now let’s see how we can create a basic desktop application with Gio.

<br />

A Go developer should recognize this pattern instantly. The application is running inside a goroutine which blocks until some external events are triggered, like a system close or destroy event. Now let’s analyze the loop() function a little bit.

<br />

For the sake of simplicity and understandability I have omitted many of the details of the cloth simulation and kept the most relevant part only. You can see that there is a infinite for loop which runs until the application is closed, either by clicking on the close button or by pressing the Escape key. On each window frame event the cloth is updated. Now let’s analyze the Update function.
// Update is invoked on each frame event of the Gio internal window calls.
// It updates the cloth particles, which are the basic entities over the
// cloth constraints are applied and solved using Verlet integration.
func (cloth *Cloth) Update(gtx layout.Context, delta float64) {
for _, p := range cloth.particles {
p.Update(gtx, delta)
}

for _, c := range cloth.constraints {
if c.p1.isActive && c.p2.isActive {
c.Update(gtx, cloth)
}
}

var path clip.Path
path.Begin(gtx.Ops)

// For performance reasons we draw the sticks as a single clip path instead of multiple clips paths.
// The performance improvement is considerable compared of drawing each clip path separately.
for _, c := range cloth.constraints {
if c.p1.isActive && c.p2.isActive {
path.MoveTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)).Add(f32.Point{X: 1.2}))
path.LineTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)).Add(f32.Point{X: 1.2}))
path.Close()
}
}
// We are using `clip.Outline` instead of `clip.Stroke`, because the performance gains
// are much better, but we need to draw the full outline of the stroke.
paint.FillShape(gtx.Ops, cloth.color, clip.Outline{
Path: path.End(),
}.Op())

path.Begin(gtx.Ops)
for _, c := range cloth.constraints {
if c.p1.isActive && c.p2.isActive {
path.MoveTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)).Add(f32.Point{Y: 1.2}))
path.LineTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)).Add(f32.Point{Y: 1.2}))
path.Close()
}
}
paint.FillShape(gtx.Ops, cloth.color, clip.Outline{
Path: path.End(),
}.Op())

// Here we are drawing the mouse focus area in a separate clip path,
// because the color used for highlighting the selected area
// should be different than the cloth’s default color.
for _, c := range cloth.constraints {
if (c.p1.isActive && c.p1.highlighted) &&
(c.p2.isActive && c.p2.highlighted) {

c.color = color.NRGBA{R: col.R, A: col.A}

path.Begin(gtx.Ops)
path.MoveTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)).Add(f32.Point{X: 1}))
path.LineTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)).Add(f32.Point{X: 1}))
path.Close()

path.MoveTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)))
path.LineTo(f32.Pt(float32(c.p2.x), float32(c.p2.y)).Add(f32.Point{Y: 1}))
path.LineTo(f32.Pt(float32(c.p1.x), float32(c.p1.y)).Add(f32.Point{Y: 1}))
path.Close()

paint.FillShape(gtx.Ops, c.color, clip.Outline{
Path: path.End(),
}.Op())
}
}
}
The two separate for…range loop is a little bit cumbersome, but unfortunately there is bug on the Gio path renderer, which creates some unwanted contoured artifacts on the drawn lines. An issue has been filed on https://todo.sr.ht/~eliasnaur/gio/474 and hopefully it will be fixed soon.
Updating the cloth
There are two things happening in the Update function: on the one hand we are updating the particles and the constraints using the Verlet integration presented earlier, and on the other hand we are drawing the cloth sticks using some of Gio’s drawing methods. We are updating the constraints only if the particles they are acting upon are active. This is an efficient way to simulate tearing apart a cloth or to make a hole in the cloth structure, because in the end if some segments in the cloth structure is broken or the connection between certain particle is lost, we can simply omit to update the constraints.
// With right click we can tear up the cloth at the mouse position.
if mouse.getRightButton() {
if dist < float64(focusArea) { p.isActive = false } } These are some of the basic components of making a cloth simulation, but the application can be extended way further, like adjusting the mouse influence or the mouse dragging force, tearing apart the cloth based on the intensity applied with the mouse over a cloth region, or even adding a control panel for adjusting certain variables on the fly. And the list can continue. We are not going to cover all of the above-mentioned ideas, but you can check the source code on the project Github page: https://github.com/esimov/cloth-physics --- Final thoughts I think that's all I wanted to share, so it's time to sum up in a few words the conclusion. For me making a 2D cloth simulation was a fun way to apply some of the fundamental theorems of algebra, like integration, differentiations and calculus to a computer program. As a matter of the Gio library it's still under development, so it doesn't have a stable API yet, resulting in broken functionalities from one release to another. But that's fine, we should acknowledge that it's an ongoing project. What's cool about it is the performance, it's really fast and stable. It never happened to me to crash unexpectedly, or to panic. It has some bugs and issues, but nothing serious. Considering that Gio hasn't been tagged with a stable Semver version it's in a quite good shape. If you like this article and have some questions, do not hesitate to contact and follow me on Twitter. You can also follow me on Github to receive information and updates about what I'm working on. Also if you want to try out the application, you can download or install it from the project Github page: https://github.com/esimov/cloth-physics and star if you like it. Links and references: https://github.com/esimov/cloth-physics https://gioui.org/ https://jonegil.github.io/gui-with-gio/?-?a gentle introduction to Gio. https://twitter.com/simo_endre
February 2, 2023
Porting Pigo face detection library to Webassembly (WASM)
In the last couple of occasions I wrote about the Pigo face detection library I have developed in Go. This article is another one from the series, but this time I’m focusing on WASM (Webassambly) integration. This is another key milestone in the library evolution, considering that when I started this project the library was capable only for face detection. Later on the library has been extended to support pupils/eyes localization, then facial landmark points detection and it has also been adapted to be integrated into different programming languages as a shared object library (SO). I’m pretty delighted about the great acceptance and support received during the development from the programming community, the library being featured a couple of times in https://golangweekly.com/, received 2.5k stars on the repo Github page (and still counting), getting many positive feedback on Reddit, which means it payed back the efforts.

But first what is WASM? To quote the https://webassembly.org/ homepage:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

In other words this means that compiling and porting a native application to the WASM standard will give the generated web application a speed almost equal to the native one.

Starting from v1.11 an experimental support for WASM has been included into the Go language, which has been extended on v1.12 and v1.13. As I mentioned in the previous articles Go suffers terrible in terms of native and platform agnostic webcam support and as of my knowledge currently there is no single webcam library in the Go ecosystem which is platform independent. This was the reason why, to prove the library real time face detection capabilities, I opted to lean on exporting the main function as a shared object library, but lately this proved to be inefficient in terms of real time performance, since on each frame the Python app had to transfer the pixel array to the Go app (where the face detection operation is happening) and get back the detected faces coordinates. Of course because of the two way (back and forth) communication, this process fall back considerably in terms of pure performance.

WASM to the rescue

As I mentioned above starting from v1.11 the standard Go code base now includes the syscall/js package which targets WASM. However the API has been refactored and gone trough a few iterations to became stable as of v1.13. This also means that the WASM API of v1.13 is no more compatible with the v1.11. The API became so mature that there is no need to use external libraries targeting the Javascript runtime in Go, like Gopherjs.

In order to compile for Webassambly we need to explicitly specify and set the GOOS=js and GOARCH=wasm environment variables on the building process. Running the below command will build the package and produce a .wasm executable Webassambly module file.

$ GOOS=js GOARCH=wasm go build -o lib.wasm wasm.go

This file then can be referenced in the main html file.

<br />

That’s the only thing we need to do in order to have fully functional WASM application. The hardest part is coming afterwards. When we are targeting WASM there are a few takeaways we need to keep in mind in order the implementation to run as smooth as possible:
1. To have access to a Javascript global variable in Go we have to call the js.Global() function.
2. To call a JS object method we have to use the Call() function.
3. To get or set an attribute of a JS object or Html element we can call the Get() or Set() functions.
4. Probably the trickiest part of the Go Wasm port is related to the callback functions, and there are a lot of places where we need to take care of them. One of the most important one is the canvas requestAnimationFrame method, which accepts as a second argument a callback function. Now to invoke this method in Go we need to apply to the js.FuncOf(func(this js.Value, args []js.Value) interface{}{...} function, where the second argument is the callback function argument. A very important note: the function body should be called inside a goroutine, otherwise you’ll get a deadlock runtime exception. You can check the package documentation here: https://godoc.org/syscall/js
The implementation details

Now let’s take a deep breath and roll into the implementation details. One of the key components of the face detection application is the binary cascade file parser. Since in normal circumstances, when we are fetching and parsing the binary files locally, we can rely on the os system package, this is not the case on WASM, because we do not have access to the system environment. This means that we cannot load and parse the files sitting on our local storage. We need to fetch them through the supported Javascript methods like the fetch method. In Javascript this method returns a promise with two arguments: a success and a failure. As I mentioned previously the callback functions needs to be invoked in separate goroutines and the most straightforward way to orchestrate the results when we are dealing with goroutines is to use channels. So the fetch method should return the parsed binary file as a byte array and an error in case of a failure.

<br />

The rest of the implementation does not differ in anything from the Unpack() method presented in the previous article. Once we get binary array there is nothing left over just to unpack the cascade file, transform it to the desired shape and extract the relevant information.

<br />

I won’t go into too much details about the detection algorithm itself, since I’ve discussed it in the previous articles. I will detail what’s most important in the perspective of the WASM integration. Another key importance part in the WASM integration is related to the webcam rendering operation. In Javascript we can access the webcam in the following way:

<br />

We can translate this snippet to Go (WASM) code in the following way:

<br />

This will return a canvas object (struct) over it we can call the rendering method itself. In the background this will trigger the requestAnimationFrame JS method, drawing each webcam frame to a dynamically created image element. From there we can extract the pixel values calling the getImageData HTML5 canvas method. Since this will return an object of type ImageData which values are of Uint8ClampedArray, these needs to be converted to a type supported by the Go WASM API. The only supported uint type in the Go syscall/js API is uint8, so the most suitable way is to convert the Uint8ClampedArray to Uint8Array. We can do this by calling the following method:

<br />

The code below shows the rendering method. It’s a pretty common Javascript code adapted to WASM Go, including the code responsible for the face detection, but this have been discussed in the previous articles. The most important part is the type conversion, without that we are just getting a compiler error.

<br />

And to put all together, in the main function we are just calling the webcam Render() method, once the webcam has been initialized, otherwise an alert telling that the webcam was not detected.

<br />

The end result

Voila, we are done. Here is how the Webassambly version of Pigo looks like in realtime. Pretty good, huh? That’s, all, thanks for reading and if you like my works you can follow me on Twitter or give it a star on the project Github repo: https://github.com/esimov/pigo.

Look at this! Can OpenCV beat this speed? Pigo has been ported to #webassembly ??, which gains him a huge speed improvements in terms of real time performance.

This proves that Pigo is capable running real time, which was not quite obvious until now. #golang #wasm #javascript pic.twitter.com/yeKir9K3YC

— Endre Simo (@simo_endre) November 18, 2019
January 7, 2020
Pupils/eyes localization in the Pigo face detection library
In the previous post I’ve presented a general overview of the Pigo face detection library I’m working on, some examples of how you can run it on various platforms and environment for detecting faces and also its desired scope in the Go ecosystem. Meantime the library got a huge revamp and reached a new milestone in the development live cycle, which means as of today Pigo is capable of pupils/eyes localization and even better supports facial landmark points detection. In this article I will focus on the pupils/eyes detection, following that in the next article aiming to discuss about the facial landmark points detection. So let’s get started.

Pupils/eyes localization

The implementation is based on the Eye pupil localization with an ensemble of randomized trees paper, which pretty much resembles with the method used on face detection but with few remarkable differences. I will explain them shortly. Both of the implementations are based on tree ensembles, well known for achieving impressive results, in contrary to a single tree detection method. This means that the classifier used for the pupil/eyes detection is generated based on decision trees and the training data is recursively clustered in this fashion until some termination condition is met.

Since we know the leaf structure of the binary classifier the only thing we need to do is to decompose it in order to obtain the data structure. The decomposition procedure is based on the following steps:
- Read the depth (size) of each tree and write it into the buffer array.
- Get the number of stages as 32-bit unsigned integers and write it into the buffer array.
- Obtain the scale multiplier (applied after each stage) and write it into the buffer array.
- Obtain the number of trees per stage and write it into the buffer array.
- Obtain the depth of each tree and write it into the buffer array.
- Traverse all the stages of the binary tree, traverse all the branches of each stage and read prediction from tree’s leaf nodes.
The returned element should be a struct containing all of the above information.

There is a problem though: the output of the regression trees might be noisy and unreliable in some special cases like when we are feeding some low quality video frames or images. This is the reason why the method introduces a random perturbation factor during runtime to outweigh the false positive rates on detection. Later on, after the detected face region is classified we sort the perturbations in ascendant order and retrieve the median value of the resulted vector.

The classification procedure consists in the following: traverse the trees on all the stages and use almost the same binary test applied on the face classification procedure. We have to restrict the pupils/eyes classification method exclusive to the detected face region. In the end we will obtain the pupils coordinates and a scale factor. The scale factor is less important in this case, since it’s useful only if we wish to give a different size for the detected pupils.

<br />

We are using the same method for left and right eye detection, the only difference is that for the right eye detection we flip the sign in the following formula.

<br />

To run it on static images a CLI application is bundled into the library. The difference from the face detection is that in addition we have to provide the binary classifier for pupil/eyes detection. Since the library is constructed in a modular fashion, this means that only certain sections needs to be extended like the binary parser and the classification function. Other parts of the code remains pretty much unchanged.

This is how the end result looks like:

Webcam demo

I have also created a webcam demo running in Python, since as of today there isn’t a native Go webcam library supported on all platforms. The major takeaway is that the computational (the face and pupils detection) part is executed in Go and the resulted data (the detection coordinates) are transferred to Python through a shared object library.

It sounds like an alchemy, but it’s definitely a working solution. On the Go part there are a few things we need to take care.
- The exported function should be annotated with the //export statement.
- The source must import the pseudo C package.
- An empty main function should be declared.
- The package must be a main package.
These are only the technical requirements imposed by the language in order to transfer the Go code as a shared object. Afterwards we can build the program with the following commands (in fact these will be executed from the Python code).

<br />

To communicate with the Go code from the Python code base we will use the Ctype library. This is a handy library which provides C compatible data types, and allows calling functions in DLLs or shared libraries. Using the library at first might be intimidating since it heavily relies on the C structs and types. Also accessing the Go data is possible exclusively through pointer interfaces. This means we need to be aware about the data type, the buffer length and the maximum capacity of the transferred Go slice. In fact the slice will be a pointer pointing to a data structure. In Python we have to declare a class which maps to a C type struct having the following components:

<br />

On the other hand at the Go counterpart after running the detection function and retrieving the coordinate results we need to convert them as one dimensional array, since in Go is not possible transfer a 2D array over an array pointer. The trick is to convert the 2D array (since we are dealing with multiple face detection and by default multiple pupils/eyes detection) to a 1D array in such a way to delimit the detection groups from each other. One efficient way to make this happen is to introduce as a first slice element the number of detected faces. Since we know how many faces are detected, later on in the Python part we can transform the array with the help of numpy to the desired shape.

This is how the Go code looks like. We are invoking the runtime.KeepAlive(coords) function to make sure the coords is not freed up by the garbage collector prematurely.

<br />

In the end we should obtain something like below, where the first value represent the number of the detected faces and the rest of the values are the x and y coordinates and the scale factor of the detected faces, respectively pupils.

[2 0 0 0 0 272 297 213 41 1 248 258 27 41 0 248 341 27 41 0 238 599 72 25 1 230 587 9 25 0 233 616 9 25 0]

Back to Python. We are capturing each frame from the webcam and transferring the underlying binary data to Go as a byte array. After running the face detector over the obtained binary data we are retrieving back the obtained one dimensional array. Ctypes requires to define a class which must contains a _field_ attribute. This attribute must be a list of 2-tuples, containing a field name and a field type. Since the Go slice has a len and a cap attribute we will define them as the tuple elements. Once we’ve defined the base class we also have to define the argument types. The argument type will be the base class and the return type will be a pointer type, represented as integer exactly as the Go return type.

<br />

Since in Ctypes we are dealing with pointer values, this means we do not have any information about the allocated buffer size needed by the face detector, this is the reason why we are allocating a buffer just enough to hold all the values returned by the detector. Later on we’ll trim the buffer length by emptying the null values. We can do this because we already know the number of detected faces. So the end results will be the number of detected faces multiplied by 3, since each face detection result should be extended with a pair of pupil detection.

<br />

And that’s all. Below is a video capture of running the face detector in real time. The small lagging is due to time needed to convert the Go code to a shared library, otherwise the algorithm can produce much higher frame rates.

Pigo, the Go face detection library now supports pupil/eyes localization.? There are still more to come! #golang #MachineLearning https://t.co/JTEZp9fdem pic.twitter.com/eNHydt2ZCL

— Endre Simo (@simo_endre) August 9, 2019

I hope that you enjoyed reading this article and got a better understanding of the interaction between Go and Python. In the next article I’ll present a concrete example using this feature. If you appreciate the work done on this library please show your support by staring the library Github repository here: https://github.com/esimov/pigo.
November 1, 2019
Technical overview of Pigo face detection library
Pigo’s cute logo

Pigo is a pure face detection library implemented in and for Go and it is based on Pixel Intensity Comparison-based Object detection paper (https://arxiv.org/pdf/1305.4537.pdf). Pigo has been around for more than a year, but I haven’t published a technical article about the library, so I thought it was time to fill in the gap.

Technical overview

The first and foremost question: what was the motivation and the purpose of making this library, since GoCV does exists from a quite a long time in the Go ecosystem and is trying to be the perfect toolset for everyone who is willing to combine the simplicity of the Go language with the comprehensiveness of OpenCV in everything related to computer vision, machine learning and anything between?
Making a quick search on the web for face detection with Go, almost 100% of the returned results are C bindings to external libraries. Pigo is making an exception from this, since it is purely developed in Go, having in mind the casual Go developer. The reference implementation is written in C and is available here: https://github.com/nenadmarkus/pico. There is no need to install platform dependent libraries, no need to compile and build the giant, monolithic OpenCV package.

What are the benefits of using Pigo? Just to name a few of them:
- High processing speed
- No need for image preprocessing prior detection
- No need for the computation of integral images, image pyramid, HOG pyramid or any other similar data structure
- The face detection is based on pixel intensity comparison encoded in the binary file tree structure
- Fast detection of in-plane rotated faces
New let’s see what are the similarities and what are the differences between Pigo and other detection methods. Similar to the Viola Jones face detection algorithm, Pigo is still using cascade decision trees at all reasonable scales and positions, but the cascade classifier is in binary format. The role of a classifier is to tell if a face is present in current region or not. The classifier consists of a decision tree where the leaves or internal nodes contains the results of pixel intensity comparison test in binary format. The binary test can be expressed with the following formula:

\( bintest(R, \text{$x1$, $y1$, $x2$, $y2$}) =
\begin{cases}
0, & \text{$R$[$x1$, $y1$]} \le \text{$R$[$x2$, $y2$]} \\
1, & \text{otherwise},
\end{cases}
\)

where \(R\) is an image region and (\(x_i\), \(y_i\)) represents the normalized pixel location coordinates, which means that the binary test can be easily resized if needed.

Since the cascades are encoded into a binary tree, this means that in order to run the detection method the cascade classifier needs to be unpacked. In the end the unpacking method will return a struct which has the following elements:

<br />

Due to many regions present in the image, the decision trees are organized in cascading classifiers, making each member of cascade to be evaluated in \( O(1) \) time with respect to the size of the region. An image region is considered being face if it passes all the cascade members. Since this process is limited to a relatively small number of regions, this gains high computation speed.

<br />

The above method will classify the cascade trees based on the trained data, but since many of them are in-plane faces, this means that rotated faces are not detectable. For this reason we need to introduce another method, which classifies the regions based on the rotation angle provided as input. (I’m not including the code snippet for this one, since you can check it on the source code.)

Another key aspect of the algorithm is that during the decision tree scanning, each detection is flagged with a detection score. An image region is considered being face if the detection score average is above a certain threshold (in general around 0.995). All regions below this threshold are not considered to be faces. We can achieve different ratios of true positives to false positives by varying the threshold value. Since the detector is based on pixel intensity comparison, this means that it’s also sensible to small perturbations and variations of the underlying pixel data, influenced mostly by the scaling and the rotation of the objects, resulting overlaps in detections.

The detection results without the clustering applied

This is the reason why the cascade detections are clustered together in a post processing step and the final result is produced by applying the intersection over union formula on the detected clusters.

<br />

In return this does not provide the rectangle coordinates of the detected faces as expected, but a slice of Detection struct consisting of number of rows and columns, the scale factor and the detection score. To translate from these to the standard image.Rectangle is only a matter of a simple conversion realized as in the following lines:

<br />

And this is how the final output looks like after the detection results have been clustered.

The detection results with the clustering applied

Pigo and GoCV comparison

The best way to compare the two libraries is to measure them both in terms of pure performance, detection speed but also accuracy. The most obvious way to evaluate them is to run some benchmarks with the same prerequisites. One thing to note is that the classifier is set to detect faces on the same image over and over again. This is to get an accurate idea of how well the algorithm performs.

<br />

To have a more accurate benchmark and to measure only the execution time the b.ResetTimer() is called to reset the timer so the setup is not counted towards the benchmark.

Here are the results:

<br />

For the above test we were using a sample image with a Nasa crew of 17 persons. Both of the libraries have returned exactly the same results, but Pigo was faster and also the memory allocation was way less compared to GoCV.

Use cases

To demonstrate the real time capabilities of the Pigo face detection library and also to show how easily could be integrated into let’s say a Python project I have created a couple of use cases, targeting different kinds of applications. The first one is the usual face detection example: real time, webcam based face detection, marking the detected faces with a rectangle (or circle).

Using Pigo as a shared object library

Go running in Python? Yes, that’s true: we are harvesting the Go feature to export binary files as shared objects. The Go compiler is capable of creating C-style dynamic shared libraries using build flag -buildmode=c-shared. Running the go program with the following flag will generate the shared object library which later on can be imported into the Python program. But the following conditions must be satisfied:
- The exported function should be annotated with the //export statement.
- The source must import the pseudo C package.
- An empty main function should be declared.
- The package must be a main package.
Knowing the above conditions we can execute a shell command from the Python program which in the end will call the shared object library returning the detection results. Later on we can operate with the values obtained. Below is an example of the process just described, whose source code and other examples can be found on the project examples directory: https://github.com/esimov/pigo/tree/master/examples.

<br />

Another handy application I have created is to blur out the detected faces. For static images we can lean on stackblur-go library to blur out the detected faces, otherwise if we are using as a shared library, which is getting imported into a Python program, then we can make use of OpenCV blur method.

Some consistent improvements of Pigo real time face detection capabilities paired with a face masking use case example. #golang #python #MachineLearning https://t.co/eRahkqjfao pic.twitter.com/CgDdNpLpdv

— Endre Simo (@simo_endre) March 17, 2019

I have also successfully integrated Pigo into Caire, the content aware image resizing library I have created. Thanks to Pigo it was possible to avoid the face distortions by restricting the seam removing algorithm only to the picture zones without faces. Once we’ve detected the faces it was only a piece of cake to exclude them from the processing. For more information about Caire read my other blog post, where I’ve detailed in large how the seam carving algorithm is working.

Serverless integration

Pigo has been successfully integrated into the OpenFaaS platform, easing you out from all the tedious work requested by the installation, configuration of the Go language etc., making possible to expose it as an FaaS function. This means once you have an OpenFaaS platform installed on your local machine with only two commands you have a fully working docker image where Pigo can be used as a serverless function.

The OpenFaaS user interface

The faceblur function applied over an input image

Below are some of the OpenFaaS functions i have created integrating Pigo:

https://github.com/esimov/pigo-openfaas-faceblur

https://github.com/esimov/pigo-openfaas

The docker image is also available on the docker hub, having already more then 10k+ downloads. https://cloud.docker.com/u/esimov/repository/docker/esimov/pigo-openfaas

Summary

In conclusion as you might realize Pigo is a lightweight, but full fledged face detection library, easy to use and easy to integrate into different platforms and environments, having a simple API, but well enough for making the job done.

For more computer vision, image processing and creative programming stuffs you can follow me on twitter: https://twitter.com/simo_endre
July 26, 2019
Coherent Line Drawing implementation in Go (GoCV)

Surfing the web a beautiful, pencil-drawn like image captured my attention, it looked like as a hand-drawn image, but also became almost evident that it was a computer generated art. I found that it was created using an algorithm known as Coherent Line Drawing.

Introduction

In the last period of time I have been working on and off (as my limited free time permitted) on the implementation of the Coherent Line Drawing algorithm developed by Kang et al. Only now I considered that the implementation is so stable that I can publish it, so here it is: https://github.com/esimov/colidr.

Since my language of choice was Go I decided to give it a try implementing the original paper in Go, more explicitly in GoCV, an OpenCV wrapper for Go. I opted for this choice since the implementation is heavily based on linear algebra concepts and requires to work with matrices and vector spaces, things in what OpenCV notoriously excels.

However some of the functions required by the algorithm were missing from the GoCV codebase (at that period of time) like Sobel, uniformly-distributed random numbers and also vector operations like getting and setting the vector values. Meantime checking the latest release I found that the core codebase has been extended with some of the missing functionalities like the Sobel threshold, bitwise operations etc., however there were still missing pieces required by the algorithm. For this reason I have extended the GoCV code base with the missing OpenCV functionalities and included into the project as vendor dependency. Probably in the upcoming future I will create a pull request to merge it back in the main repository.

I won’t go into too much detail about the algorithm itself, you can find it in the original paper. I will discuss mostly about the technical difficulties encountered during the implementation, but also about the challenges imposed by gocv and OpenCV.

Why Go?

Now let’s talk about the challenges imposed by the project. The first question which might arise is why in Go, knowing that Go is not really appealing for creative coding, and it is mostly used in automatization, infrastructure, devops and web programming.

From my first acquaintance with Go (which was quite a few years back) I was intrigued by the potential possibilities offered by the language to make use of it in fields like image processing, computer vision, creative coding etc, since these are the fields i’m mostly interested in, and almost all of my open source project developed in Go have circulated around this topic. So this project was another attempt of mine to demonstrate that the language is well suited for these kind of projects too.

Go has a small package for image operations, but it is fair enough for anything you need to read an image file, obtain and manipulate the pixel values and finally to encode the result into an output file. Everything is concise and well structured. Since in this project we mostly rely on the OpenCV embedded functions provided by GoCV, there are still plenty of use cases when you need to rely on the core image library. Go does not provide a high level, abstract function to modify the source image, you need to access the raw pixel values in order to modify them.

Another key aspect in the language choice was that Go has an out of the box command line flag parsing library, just what I needed, since I conceived this application to be executed from the terminal. Maybe in the upcoming future I will create a GUI version too.

Technical challenges

Going back to the technical challenges I encountered during the implementation, one of the main headaches was related to how sensible OpenCV is to the way the types of matrices are declared. I was banging my head into the wall many times for the simple reason that my matrices were defined as gocv.MatTypeCV32F, however they should have been defined as gocv.MatTypeCV32F+gocv.MatChannels3, since the concatenation of these two variables were producing the desired matrix type value declared in the underlying OpenCV code base. More exactly by creating a new Mat and defining it’s type as simply MatTypeCV32F, the underlying gocv method will call the Mat_NewWithSize C method, having the last parameter the type of the matrix. Exactly this kind of limitation have confused me, ie. not all of the supported OpenCV mat types have been defined in the GoCV counterpart.

Since OpenCV is very flexible on matrix operations and won’t complain about the matrix conversions from one type to another, there are some edge cases when they are producing undesired results. This is a thing you have to consider when you are doing matrix operations in OpenCV: you need to be aware of the matrix type, otherwise your end results could be utterly compromised.

However comparing the OpenCV and GoCV matrix type tables, a lot of types are still missing from GoCV. For this simple “thing” my outputs were far from desirable from what it should have been. I was spinning in round and round, going back and forth, trying different solutions, comparing the code with the pseudo algorithms and formulas provided in the original paper to finally realizing that my matrices were defined with the wrong type or because some of the types declared in OpenCV were completely missing from the GoCV counterpart. The solution was either to extend the core code base or to concatenate the two matrix types (as presented above) in order to produce the correct type value requested by OpenCV.

Another elementary thing which is missing from GoCV is the SetVecAt method for setting or updating the vector values, even though a method for retrieving the vector values does exists. My initial attempt was to modify the vector values on byte level and encode it back into a matrix using the GoCV NewMatFromBytes method, which proved to be completely inefficient.

<br />

The solution was to extend the core GoCV codebase with a SetVec method.

<br />

Another thing I learned is converting one matrix type to another does not always work as you think. I experienced this issue when during the debugging session I had to convert a float64 matrix to uint8, which can be exported as a byte array needed for the final binary encoding. It worked, but converting back to a float64 matrix requested by rotateFlow method didn’t not produced the desired output. (This method applies a rotation angle on the original gradient field and calculates the new angles.)

<br />

Since Go is using strict types for variable declarations, the auto casting is not possible like in C or C++. For this reason you need to pay attention to how you are converting the values from one type to another. Because GoCV / OpenCV matrices defined as floats are using 32 bits precision float values we need to be cautious when we have to cast a value defined as float64 for example to a 32 bit integer. This was the case on edge tangent flow visualization.

The examples below were produced with the wrong (left) and good type casting (right).

Even though the paper provided the implementation details for ETF visualization, my first attempt of it’s implementation didn’t produced the correct results. Only when I printed out the results I realized that the values were spanning over the range of the 32 bit integers, however OpenCV does not complained about this. The solution for this problem was to cast the index values of the for loop iterator as float32.

<br />

This is a takeaway you have to consider when you are working with OpenCV matrix types, especially in a language like Go, which is very strict in terms of variable types definition.

Conclusion

To sum up: GoCV is a welcome addition for the Go ecosystem, considering that it is under development many of the core OpenCV features are already implemented. However as I mentioned above there are still missing features, which should be addressed to become a viable OpenCV wrapper for the Go ecosystem. What I have learned using OpenCV is that you have to tinker with the values, and slight changes on the inputs can produce completely broken outputs, so you need to find that tiny middle road where the different equation parameters can converge.

Sample images

Source code

The code is open source and can be found on my Github page:

https://github.com/esimov/colidr

You can follow me on twitter too: https://twitter.com/simo_endre

July 10, 2019
Caire – a content aware image resize library
Let’s assume you want to resize an image without content distortion but also you wish to preserve the relevant image parts. The normal image resize, but also the content cropping technique is not really suitable for this kind of task, since the first one will simply resize the image by preserving the aspect ratio and the last one will crop the image on the defined coordinate section, which might results in content loss, especially on photos with the relevant information scattered trough the image. Not even the smart cropping technique will help too much in this case.

This is what Caire, my content aware image resizing library developed in Go is trying to remedy. I’is pretty much based on the article Seam Carving for Content-Aware Image Resizing by Shai Avidan and Ariel Shamir.

The background

Let’s consider the image below (Fig.1). It’s a nice and clean picture with a wide open background. Now suppose that we want to make it smaller. We have two options: either to crop it, or to scale it. Cropping is limited since it can only remove pixels from the image periphery. Also advanced cropping features like smart cropping cannot resolve our issue, since it will remove the person from the left margin or will crop a small part from the castle. Certainly we do not want this to happen. Scaling also is not sufficient since it is not aware of the image content and typically can be applied only uniformly.

Fig.1: Sample image

Seam carving was developed typically for this kind of use cases. It works by establishing a number of seams (a connected path of low energy pixels) crossing the image from top to down or from left to right defining the importance of pixels. By successively removing or inserting seams we can reduce or enlarge the size of the image in both directions. Fig.2. Illustrates the process.

Fig.2: The seam carving method illustrated

First let’s skim through the details and summarize the important steps.
- An energy map (edge detection) is generated from the provided image.
- The algorithm tries to find the least important parts of the image taking into account the lowest energy values.
- Using a dynamic programming approach the algorithm will generate individual seams crossing the image from top to down, or from left to right (depending on the horizontal or vertical resizing) and will allocate for each seam a custom value, the least important pixels having the lowest energy cost and the most important ones having the highest cost.
- Traverse the image from the second row to the last row and compute the cumulative minimum energy for all possible connected seams for each entry.
- The minimum energy level is calculated by summing up the current pixel value with the lowest value of the neighboring pixels from the previous row.
- Traverse the image from top to bottom (or from left to right in case of vertical resizing) and compute the minimum energy level. For each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it.
- Find the lowest cost seam from the energy matrix starting from the last row and remove it.
- Repeat the process.
Implementation

Seam carving can support several types of energy functions such as gradient magnitude, entropy, sobel filter, each of them having something in common: it values a pixel by measuring its contrast with its neighboring pixels. We will use the Sobel filter operator in our case. Typically in image processing the Sobel operator is used to detect image edges. It’s working with an energy distribution matrix to differentiate the sensitive image information from the less sensitive ones. (Fig. 3)

Fig.3: Sobel threshold applied to the original image

Once we obtained the energy distribution matrix we can advance to the next step to generate individual seams of one pixel wide. We’ll use a dynamic programming approach to store the results of sub-calculations in order to simplify calculating the more complex ones. For this purpose we define some setter and getter methods whose role are to set and get the pixel energy value.

<br />

After generating the energy map there are two important steps we need to follow:

Traverse the image from the second row to the last one and compute the cumulative minimum energy M for all possible connected seams for each entry (i, j). M is the two dimensional array of cumulative energies we are building up. The minimum energy level is calculated by summing up the current pixel value with the minimum pixel value of the neighboring pixels obtained from the previous row. This can be done via Dijkstra’s algorithm. Suppose that we have a matrix with the following values:

Original matrix

To compute the minimum cumulative energies of the second row we start with the columns from the first row and sum up with the minimum value of the neighboring cells from the second row. After the above operation is carried out for every pixel in the second row, we go to the third row and so on.

Using Dijkstra’s algorithm to calculate the minimum energy values.

<br />

Once we calculated the minimum energy values for each row, we select the last row as starting position and search for the pixel with the smallest cumulative energy value. Then we traverse up on the matrix table one row at once and again search for the minimum cumulative energy value up until the first row. The obtained values (pixels) make up the seam which should be removed.

<br />

To remove the seam is only a matter of obtaining the pixel coordinates of the seam values and checking on each iteration if the processed pixel coordinates corresponds with the seams pixel values position. We can check the accuracy of our logic by drawing the removable seams on top of the image. (Fig.4)

Fig.4: Low energy seams crossing the image

<br />

The same logic can be applied to enlarge images, only this time we compute the optimal vertical or horizontal seam (s) and duplicate the pixels of s by averaging them with their left and right neighbors (top and bottom in the horizontal case).

Fig.5: The final resized image

Prevent face deformation with face detection

For certain content such as faces when relations between features are important, automatic face detection can be used to identify the areas that needs protection. This is an important requirement since in certain situations when sensible image regions like faces (detected by the sobel filter operator) are compressed in a small area it might happen to get distorted (see Fig.6). To prevent this, once these regions are detected we can increase the pixel intensity to a very high level before running the edge detector. This way we can assure that the sobel detector will consider these regions as important ones, which also means that they will receive high energy values.

Fig.6: Resizing image with and without face detection

Below you can see the seam carving algorithm in action, first checking for human faces prior resizing the image. It’s visible from the image that the seams are trying to avoid the face zone normally marked with a rectangle. And this is done with a simple trick: if face zone is detected that rectangle is marked with a white background which makes the detector to believe that it’s an area with high energy map. For face detection it was used Pigo, another Go library developed by me.

Conclusion

Seam carving is a technique which can be used on a variety of image manipulations including aspect ratio change, image retargeting, object removal or even content amplification. Also it can be seamlessly integrated with a Convolutional Neural Network which can be trained for specific object recognition, making the perfect toolset for every content delivery solution. There are other numerous possible domains this technique could be applied or even extended like video resizing or the ability for continuous resizing in real time.

Of course as every technology has its limitations, like the case when the processed image is very condensed, in the sense that it does not contain “less” important areas, ugly artifacts might appear. Also the algorithm does not perform very well when the image albeit being not very condensed, the content is laid out in a manner that it does not permit the seams from bypassing some important parts. In certain situations by tweaking the parameters, like using a higher sobel threshold or applying a blur filter these kind of limitations could be surpassed.

Source code

The code is open sourced and can be found on my Github page:

https://github.com/esimov/caire

https://github.com/esimov/pigo
May 2, 2019