How to improve conversions without losing customer data

You may have had the experience of filling out a long form on a website. For example, creating an account to make a purchase, or applying for a job, or renewing your car insurance.

A long form can lead to customers losing interest and taking their business elsewhere. Each additional field can result in up to 10% more customers dropping out instead of completing the form.

If you have a business with a form like this, one reason why you’re not able to simplify your form is because the data you are requesting is valuable.

There are lots of ways to address the problem, such as improving the design of the form, or splitting it across multiple pages, removing the “confirm password” field, and so on. But it appears that most fields can’t be removed without inherently degrading the data you collect on these new customers.

However with machine learning it’s possible to predict the values of some of these fields, and completely remove them from the form without sacrificing too much information. This way you gain more customers. You would need to have a history of what information customers have provided in the past, in order to remove the fields for new customers.

A few examples

  • On a small ads site, you require users to upload a photo, or fill out a description of the item they’re selling. With machine learning you can suggest a price from the description, or a title from the photo, resulting in less typing for the user.
  • On a recruitment website, you can use machine learning to deduce lots of data (name, address, salary, desired role) directly from the candidate’s CV when it’s uploaded. Even salary can be predicted although it’s not usually explicit in the CV.
  • On a car insurance website, it’s possible to retrieve make, model, car tax and insurance status from an image of the car.

If you are interested and would like to know more please send me a message.

For an example of how data can be inferred from an unstructured text field please check out my forensic stylometry demo.

Building a face recogniser: traditional methods vs deep learning

Face recognition technology has existed for quite some time, but until recently it was not accurate enough for most purposes.
Now it seems that face recognition is everywhere:

  • you upload a photo to Facebook and it suggests who is in the picture
  • your smartphone can probably recognise faces
  • lots of celebrity look-a-like apps have suddenly appeared on the app stores
  • police and antiterrorism units all over the world use the latest in face recognition technology

The reason why facial recognition software has recently got a lot better and a lot faster is due to the advent of deep learning: more powerful and parallelised computers, and better software design.
I’m going to talk about what’s changed.

Traditional face recognition: Eigenfaces
The first serious attempts to build a face recogniser were back in the 1980s and 90s and used something called Eigenfaces. An Eigenface is a blurry face-like image, and a face recogniser assumes that every face is made of lots of these images overlaid on top of each other pixel by pixel.

If we want to recognise an unknown face we just work out which Eigenfaces it’s likely to be composed of.
Not surprisingly the Eigenface method didn’t work very well. If you shift a face image a few pixels to the right or left, you can easily see how this method will fail, since the parts of the face won’t line up with the eigenface any more.

Next step up in complexity: facial feature points
The next generation of face recognisers would take each face image and find important points such as the corner of the mouth, or an eyebrow. The coordinates of these points are called facial feature points. One well known commercial program converts every face into 66 feature points. 

To compare two faces you simply compare the coordinates (after adjusting in case one image is slightly off alignment).

Not surprisingly the facial feature coordinates method is better than the Eigenfaces method but is still suboptimal. We are throwing lots of useful information away: hair colour, eye colour, any facial structure that isn’t captured by a feature point, etc.

Deep learning approach

The last method in particular involved a human programming into a computer the definition of an “eyebrow” etc. The current generation of face recognisers throws all this out of the window.

This approach used convolutional neural networks (CNNs). This involves repeatedly walking a kind of stencil over the image and working out where subsections of the image match particular patterns.

The first time, you pick up corners and edges. After doing this five times, each time on the output of the previous run, you start to pick up parts of an eye or ear. After 30 times, you have recognised a whole face!

The neat trick is that nobody has defined the patterns that we are looking for but rather they come from training the network with millions of face images.

Of course this can be an Achilles’ heel of the CNN approach since you may have no idea exactly why a face recogniser gave a particular answer.

The obstacle you encounter if you want to develop your own CNN face recogniser is, where can you get millions of images to develop the model? Lots of people scrape celebrity images from the internet to do this.

However you can get much more images if you can get people to give you their personal photos for free!

This is the reason why Facebook, Microsoft and Google have some of the most accurate face recognisers, since they have access to the resources necessary to train the models.

The CNN approach is far from perfect and many companies will have some adjustments on top of what I described in order to compensate for its limitations, such as correcting for pose and lighting, often using a 3D mesh model of the face. The field is advancing rapidly and every year the state of the art in face recognition brings a noticeable improvement.

If you’d like to know more about this field or similar projects please get in touch.

Predicting customer churn

One question faced by lots of companies in competitive markets, is… why are our customers leaving us? What drives them to switch to a competitor? This is called ‘customer churn’.

Imagine you run a utility company. You know this about each of your customers:

  • When they signed the first contract
  • How much power they use on weekdays, weekends, etc
  • Size of household
  • Zip code / Postcode

For millions of customers you also know whether they stayed with your company, or switched to a different provider.

Ideally you’d like to identify the people who are likely to switch their supply, before they do so! Then you can offer them promotions or loyalty rewards to convince them to stay.

How can you go about this?

If you have a data scientist or statistician at your company, they can probably run an analysis and produce a detailed report, telling you that high consumption customers in X or Y demographic are highly likely to switch supply.

It’s nice to have this report and it probably has some pretty graphs. But what I want to know is, for each of the 2 million customers in my database, what is the probability that the customer will churn?

If you build a machine learning model you can get this information. For example, customer 34534231 is 79% likely to switch to a competitor in the next month.

Surprisingly building a model like this is very simple. I like to use Scikit-learn for this which is a nice easy-to-use machine learning library in Python. It’s possible to knock up a program in a day which will connect to your database, and give you this probability, for any customer.

One problem you’ll encounter is that the data is very non-homogeneous. For example, the postcode or zip code is a kind of category, while power consumption is a continuous number. For this kind of problem I found the most suitable algorithms are Support Vector Machines, and Random Forest, both of which are in Scikit-learn. I also have a trick of augmenting location data with demographic data for that location, which improves the accuracy of the prediction.

If customer churn is an issue for your business and you’d like to anticipate it before it happens, I’d love to hear from you! Get in touch via the contact form to find out more.