Face recognition technology has existed for quite some time, but until recently it was not accurate enough for most purposes.
Now it seems that face recognition is everywhere:
- you upload a photo to Facebook and it suggests who is in the picture
- your smartphone can probably recognise faces
- lots of celebrity look-a-like apps have suddenly appeared on the app stores
- police and antiterrorism units all over the world use the latest in face recognition technology
The reason why facial recognition software has recently got a lot better and a lot faster is due to the advent of deep learning: more powerful and parallelised computers, and better software design.
I’m going to talk about what’s changed.
Traditional face recognition: Eigenfaces
The first serious attempts to build a face recogniser were back in the 1980s and 90s and used something called Eigenfaces. An Eigenface is a blurry face-like image, and a face recogniser assumes that every face is made of lots of these images overlaid on top of each other pixel by pixel.
If we want to recognise an unknown face we just work out which Eigenfaces it’s likely to be composed of.
Not surprisingly the Eigenface method didn’t work very well. If you shift a face image a few pixels to the right or left, you can easily see how this method will fail, since the parts of the face won’t line up with the eigenface any more.
Next step up in complexity: facial feature points
The next generation of face recognisers would take each face image and find important points such as the corner of the mouth, or an eyebrow. The coordinates of these points are called facial feature points. One well known commercial program converts every face into 66 feature points.
To compare two faces you simply compare the coordinates (after adjusting in case one image is slightly off alignment).
Not surprisingly the facial feature coordinates method is better than the Eigenfaces method but is still suboptimal. We are throwing lots of useful information away: hair colour, eye colour, any facial structure that isn’t captured by a feature point, etc.
Deep learning approach
The last method in particular involved a human programming into a computer the definition of an “eyebrow” etc. The current generation of face recognisers throws all this out of the window.
This approach used convolutional neural networks (CNNs). This involves repeatedly walking a kind of stencil over the image and working out where subsections of the image match particular patterns.
The first time, you pick up corners and edges. After doing this five times, each time on the output of the previous run, you start to pick up parts of an eye or ear. After 30 times, you have recognised a whole face!
The neat trick is that nobody has defined the patterns that we are looking for but rather they come from training the network with millions of face images.
Of course this can be an Achilles’ heel of the CNN approach since you may have no idea exactly why a face recogniser gave a particular answer.
The obstacle you encounter if you want to develop your own CNN face recogniser is, where can you get millions of images to develop the model? Lots of people scrape celebrity images from the internet to do this.
However you can get much more images if you can get people to give you their personal photos for free!
This is the reason why Facebook, Microsoft and Google have some of the most accurate face recognisers, since they have access to the resources necessary to train the models.
The CNN approach is far from perfect and many companies will have some adjustments on top of what I described in order to compensate for its limitations, such as correcting for pose and lighting, often using a 3D mesh model of the face. The field is advancing rapidly and every year the state of the art in face recognition brings a noticeable improvement.
If you’d like to know more about this field or similar projects please get in touch.