# Random Forest and logical regression

Need this to be done in 24 hours

A company has had a growing problem with increasing customer defections above industry average. The churn issue is most acute in the SME division and thus they want it to be the first priority.

Predict the customers which are most likely to churn.

Find the churn issue

Build the model and suggest actions

Data provided:

Training model – dataset which includes features of SME customers in Jan 2016, as well as the info about whether or not they have churned by Mar 2016.

The prices from 2015 for these customers.

How:

Use the trained model to “score” customers in the verification data set (provided in the file) and put them in descending order of the propensity to churn.

Classify these customers into two classes: those which you predict to churn are to be labelled "1" and the remaining customers should be labelled "0" in the result template.

Use the ROC curve and Brier scoring

Further:

Will a 20% discount offered to customers predict to be a good measure for churn (assuming it’s applied to all customers)?

(Table 1 describes all the data fields which are found in the data. You will notice that the contents of some fields are meaningless text strings. This is due to "hashing" of text fields for data privacy. While their commercial interpretation is lost as a result of the hashing, they may still have predictive power)

