Telecom Churn Case Study

With 21 predictor variables we need to predict whether a particular customer will switch to another telecom provider or not. In telecom terminology, this is referred to as churning and not churning, respectively.

Importing and Merging Data

Let's understand the structure of our dataframe

Data Preparation

Dummy Variable Creation

Dropping the repeated variables

Now we can see we have all variables as integer.

Checking for Outliers

From the distribution shown above, you can see that there no outliner in your data. The numbers are gradually increasing.

Checking for Missing Values and Inputing Them

It means that 11/7043 = 0.001561834 i.e 0.1%, best is to remove these observations from the analysis

Now we don't have any missing values

Feature Standardisation

Checking the Churn Rate

We have almost 27% churn rate

Model Building

Let's start by splitting our data into a training set and a test set.

Splitting Data into Training and Test Sets

Running Your First Training Model

Correlation Matrix

Dropping highly correlated variables.

Checking the Correlation Matrix

After dropping highly correlated variables now let's check the correlation matrix again.

Re-Running the Model

Now let's run our model again after dropping highly correlated variables

Feature Selection Using RFE

Dropping Variable with high VIF

Making Predictions

Model Evaluation

ROC Curve

An ROC curve demonstrates several things:

Finding Optimal Cutoff Point

Optimal cutoff probability is that prob where we get balanced sensitivity and specificity

From the curve above, 0.3 is the optimum point to take it as a cutoff probability.