Select Page

Big Data and machine learning to control the Churn rate.

The TELCO sector is characterized by continuous and expensive investments in infrastructures and new technologies, to survive stifling competition. The fight for each customer is relentless and not only do large operators fight each other, they also fight virtual operators due to their agility to quickly adapt to new customer needs at low cost.

In this article we want to share a success story based on commercial campaigns fed from machine learning models, to gain reaction time and minimize Churn rate.

One of the most feared metrics by operators is the rate of customer churn to competition, data that is known with the term churn rate.

Even a global rate of Churn zero or negative can hurt, because a high flow of client migrations generates high expenses derived from registrations, cancellations and configuration changes in different systems, returns of equipment, physical installations in buildings and the client's home through contracts, etc.

Fighting Churn (escaped client) is vital
Great resources are dedicated to this


Know the customer

Traditionally, business rules based on the knowledge of experts in the area to try to predict who are the most likely customers to leave, making use of a few parameters to classify users.

There is even a specific terminology for certain customer behaviors: threatening (makes portability requests and cancels them), mercenary (makes continuous company changes in short periods of time), or Robinson (You don't want to be bothered with business proposals).

Periodically, with semi-manual methods, the list of clients most likely to leave is obtained. It is filtered by criteria such as billing or delinquency information, and the list of clients to contact is generated to offer you a loyalty offer with free terminals, discounts, etc.

Large TELCOs have the advantage of accumulating a huge amount of user data from various sources of information. Among others:

    • Geographical mobility data: the automatic connection of the mobile phone to the different radio cells reflects the zones and times through which the user moves, being able to infer the work zone, his home, leisure and weekend zones, purchasing power, etc.
    • Use of data from mobile: when and where you use data from your mobile, what type of content you access or if you are browsing competitor pages. For example, if you use a lot of data during daytime hours, the traffic almost disappears at night, but you do not have a fiber contract, it is very likely that you are browsing through a competitor's WIFI.
    • Billing: type of rate and offers in progress, rates and previous offers, time remaining until the end of the stay, amount pending payment of terminals purchased in installments, age of the lines contracted, payment problems.
    • Terminal type.
    • Devices in the home- Information about devices in the customer's home that connect through the company's router.
    • Calls to your own or a competitor's call-center.
    • Claims and Incidents on the network that have affected you.
HAL 9000

HAL 9000

Technologies associated with the field of Big Data and Machine Learning provide
solutions to process and analyze in real time
this massive and incessant amount of user information.


A little bit of Big Data and Machine Learning

What does Big Data technology offer compared to traditional solutions? Big Data It is used when the volume, speed and variety of the information is so high that specific analytical methods and technology are required to transform information into value.

Reviewing, Big Data solutions provide:

    • Horizontal scalability: storage and processing grows linearly by adding more nodes.
    • High availability and fault tolerance, through automatic information replication.
    • Massively parallel processing, with the possibility of distributing the processing to hundreds or thousands of nodes.
    • Treatment of large data streams in real time (data streaming).

As to Machine Learning (ML), thanks to technologies such as Apache Spark, we have products that implement machine learning algorithms for parallel execution in a multi-server cluster, offering unimaginable response times with traditional procedures.

Apache Spark Logo


Success story: Minimizing Churn with new technologies


Objective: reduce the flight of customers to the competition.
"Stop bleeding"

How? Easy. We calculate daily the probability of leakage (churn rate) from all available data sources, to offer loyalty offers to more prone customers (cross sales, terminals, discounts).

Calculate churn in TELCO

Calculate churn in TELCO

Yeah, but how do we get there on time?
This is where the combination of Big Data and Machine Learning dazzles.
"Stop the bleeding in time"


Instead of being limited to a few parameters, we can use hundreds of parameters that describe each user; Instead of making a monthly calculation we have the ability to make daily calculations; business rules can be replaced by efficient Machine Learning algorithms. And with the assurance that we can scale our parallel processing as much as we need, in an almost immediate and transparent way.

The solution is based on using a model through machine learning. The training of the model is carried out with historical data of the users, where for each one it is known whether or not there was a leak (churn).

Churn model training

Churn model training


Once the model is built, we can use it daily to calculate the probability of each client. This more detailed diagram shows the process steps grouped into three phases:

Churn detection process steps

Churn detection process steps


El process has three phases:

    1. Initial phase: It would be composed of steps 1 to 4. It is where the analysis of the data sources is carried out, the information of each user is added, different models are tested and, finally, the model to use is selected.
    2. Production phase: It consists of steps 1, 2, 5 and 6. The intake and preparation of new data from the previous day is carried out, and the model is applied to obtain the daily predictions.
    3. Re-training phase: Usually, every several months or when the results begin to decline, it is convenient to re-train the model with updated data that reflects the new trends (there are new rates, new economic situation, new technologies, competitors, etc). It may be a gentle re-training, where the model is simply retrained with the same user variables in a more recent time slot (steps 3 and 4), or it can be a strong re-training where new user variables are analyzed and incorporated (steps 1, 2, 3 and 4).


Conclusions and next steps

It is a fact: the technologies associated with the combination of Big Data and Machine Learning are sufficiently mature. Here we have shown you a simple use case, where we analyze multiple sources of customer data (which has its crumb) to work on the customer's profile through ML algorithms that allow us to predict their behavior (probability of leakage or churn rate). In the case that we have explained to you, the improvement of the campaigns moves around 30% less escaped clients.

Fortunately, huge investments in hardware are no longer necessary, there is the alternative of renting a cluster of nodes to perform complex processing at the time it is needed. In addition, we also have numerous free software solutions available, widely documented and applied to success stories in all sectors.

The next step is clear: Go for it!

You can count on the Big Data services and solutions Panel Sistemas or you can continue building your own criteria.

Come on, if you are going, it is absolutely recommended our 101 Introduction to Data Science (Data Science) as well as the AWS whitepaper on "Getting Started With Machine Learning: Tips From Cutting Edge Experts" (PDF) in which they set their guidelines for companies to take a step forward and break the entry barriers to this opportunity to compete by squeezing data.


Would you rather play it short and at the foot?

You are one mouse click away from
set up your own Data Science Service 
adapted to your needs.
You have come to the perfect place!

(you can say that you're on my side 😉)



Francisco Javier Molinero Velasco

Francisco Javier Molinero Velasco

Javier is a Consultant in Panel Sistemas in Big Data and Machine Learning solutions for process optimization. You can contact him via e-mail, or visit your profile at LinkedIn

Leave us your comment


Send a comment

Your email address will not be published. Required fields are marked with *

Share This