International Journal Of Innovative Research In Computer Science .

Transcription

International Journal of Innovative Research in Computer Science & Technology (IJIRCST)ISSN: 2347-5552, Volume-8, Issue-2, March .ijircst.orgEnhanced Churn Prediction in theTelecommunication Industry Awodele Oludele, Adeniyi Ben*, Ogbonna A.C., Kuyoro S.O., Ebiesuwa SeunABSTRACT- Prediction models are usually built byapplying a supervised learning algorithm to historical data.This involves the use of data analytics system that usesreal-time integration and dynamic real time responses datato detect churn risks. Subscribe are increasinglyterminatingtheirmembership agreementwithtelecommunication companies through mobile numberportability (MNP) in order to subscribe to anothercompetitor companies.To model the Customer prediction, a Markov Chain Modelwill be used. The Markov model allows for more flexibilitythan most other potential models, and can incorporatevariables such as non-constant retention rate, which is notpossible in the simpler models. The model allows looking atindividual customer relationships as well as averages, andits probabilistic nature makes the uncertaintyapprehensible. The Markov Decision Process is alsoappealing, but since dynamic decisions along the lifetime ofthe customer will not be evaluated the Markov Chain is thesimplest model that still meets the requirements. Each statein the Markov Chain will represent a person being acustomer for one month, with an infinite number of states.The transition probability to move from one state to the nextis equivalent to a customer retaining with the operator tothe next month. A customer that has churned will beconsidered lost forever.Once the retention and churn rates are determined, thereference churn value for each customer will be computed.The churn rate will be calculated using MATLAB MonteCarlo simulations, running a large number of fictitiouscustomer-company relationship processes, and extractingthe results of the average customer.Manuscript received March 20, 2020Awodele Oludele, Computer Science Department,Babcock University, Ilishan-Remo, Nigeria.Adeniyi Ben, Computer Science Department, BabcockUniversity, Ilishan-Remo, Nigeria. (e-mail:ben.adeniyi@gmail.com)Ogbonna A.C, Computer Science Department, BabcockUniversity, Ilishan-Remo, Nigeria.Kuyoro S.O., Computer Science Department, BabcockUniversity, Ilishan-Remo, Nigeria.Ebiesuwa Seun, Computer Science Department,Babcock University, Ilishan-Remo, Nigeria.Using simulation approach gives better result thananalytical methods, since an indefinite number of statesmake matrix algebra complicated. It also allows visualizingthe distribution of the results more easily than withalgebraic calculation.To the telecom companies the result of this analysis wouldimprove the level at which they can predict customer churn,because, it will give insight on why a customer wouldchoose to leave one telecommunication industry for anothertelecommunication industry. In other words, the cost ofadvertisement and loyalty programs as well as challengesface in retaining loyal customers would be identified. Theresearcher believes that the enhanced model for churnprediction developed from this study will lead to betterretention strategy, improved telecom quality service,enhanced customer loyalty due to the improvement servicefrom applying the information from this model. The studyis also significant to researchers, behavioral scientist,business analysts as well as professionals in the computerscience domain. The study will serve as a good referencematerial for research and contribute to the growingobjective of developing enhanced model built on datamining techniques that can explain the churn behavior withmore accuracy than using single methods.KEYWORDS- Prediction Models, mobileportability, Markov Decision Process, churn ratenumberI. INTRODUCTIONChurn analysis, is useful in many businesses with manycustomers or high-value customers [4]. Customer churnanalytics are being used for a variety of reasons. In thefinancial services, consumer package goods, energy,manufacturing insurance sector, etc churn analysis is usedto measure account holder lifecycle, detect users thinking ofswitching banks; develop a support model that encouragesloyalty, measure how much revenue is at risk of being lost toother providers, measure churn for direct and downstreambuyers and predict a user’s likelihood to close a policy. Inthe telecom sector, a churn analysis will show the numberof users or accounts that cease using an organizationsproducts or services over a set time period. Churn analysisalso identifies customers who are most likely to churn. Thisidentification of valuable customers likely to churn and theexecution of proactive steps to retain customers are thecharacteristics of churn management [10]. Predictionmodels are usually built by applying a supervised learningalgorithm to historical data. This involves the use of data.Copyright 2019. Innovative Research Publication. All Rights Reserve6

Enhanced Churn Prediction in the Telecommunication Industryanalytics system that uses real-time integration anddynamic real time responses data to detect churn risks.Subscribe are increasingly terminating their membershipagreement with telecommunication companies throughmobile number portability (MNP) in order to subscribe toanother competitor companies. According to [1],telecommunication companies alone account for 30% ofchurn rate worldwide. It is cheaper to prevent churningthan to acquire, advertise or attract new customers. In orderto achieve this, telecommunication companies must be ableto manage churn effectively.Literature show that several solutions have been profferedto detect churn behavior. However, due to firm rivalry newinnovations, low switching costs, deregulation bygovernments, such solutions become ineffective overtime.Some of these solutions were hampered by the restrictionson data collection and data imbalance. Also, in most works,only one data mining method was applied and no room foradequate comparisons. Few authors have attempted tocombine techniques; however, these were only able topredict momentary churning behaviors. Hence, the need fora model that can accurately predict churn behavior. Thefocus of this work is developing an enhanced churnmanagement model by comparing five different algorithmsfor the prediction of churn behavior. The aim of this work isto develop an enhanced predictive model for churnmanagement in the telecommunication industry using datamining techniques. This would be carried out by acomparative analysis of the data mining classifieralgorithms then design an enhanced predictive model basedon the outcome of the analysis, which would beimplemented as an enhanced predictive model.II. METHODOLOGYTo carry out a comparative analysis of the existing churnmanagement models, we study the various characteristics ofexisting models based on techniques used methods of dataclassification and feature selection processes. Based on thiscomparison, this study can discover various types ofknowledge, including association, classification, clustering,prediction, sequential patterns and decision tree. Theknowledge acquired from this comparison will then beclassified into general knowledge, primitive-levelknowledge, and multilevel knowledge.The design of the enhanced predictive model will comprisethe selection of the following classification algorithms;Decision Tree, Random Forest, Neural Network. SupportVector Machine and Logistic Regression. These classifierswill be evaluated using sensitivity, accuracy, correctlyclassified instances and specificity. The algorithm with thebest result will be used to train and build the model forpredicting customer churn in telecommunications sector.Also, the open source data mining software R using Rattleas an interface will be used as the trees produced with thesoftware are less complicated and more compact than someother implementations (such as in WEKA). Theimbalanced dataset affects the performance of algorithm.Thus, additional techniques such as under-sampling will beintroduced. Furthermore, the data will be trained. The dataset will be split into a train and a test set. The train set is aset of examples used for learning. The test set is anindependent data set used to assess the performance of thelearned classifier. Training set consists of a randomhold-out sample of 70% of the total data set. The test setconsists of the other 30%. To make sure that each class isrepresented in the train and test set, stratified sampling isused. With stratified sampling a sampling fraction of eachoutcome class that is proportional to that of the totalpopulation is used. Thus, each set contains approximatelythe same percentage of samples of each target class as thecomplete set.To model the Customer prediction, a Markov Chain Modelwill be used. The Markov model allows for more flexibilitythan most other potential models, and can incorporatevariables such as non-constant retention rate, which is notpossible in the simpler models. The model allows looking atindividual customer relationships as well as averages, andits probabilistic nature makes the uncertainty apprehensible.The Markov Decision Process is also appealing, but sincedynamic decisions along the lifetime of the customer willnot be evaluated the Markov Chain is the simplest modelthat still meets the requirements. Each state in the MarkovChain will represent a person being a customer for onemonth, with an infinite number of states. The transitionprobability to move from one state to the next is equivalentto a customer retaining with the operator to the next month.A customer that has churned will be considered lost forever.Once the retention and churn rates are determined, thereference churn value for each customer will be computed.The churn rate will be calculated using MATLAB MonteCarlo simulations, running a large number of fictitiouscustomer-company relationship processes, and extractingthe results of the average customer. Using simulationapproach gives better result than analytical methods, sincean indefinite number of states make matrix algebracomplicated. It also allows visualizing the distribution ofthe results more easily than with algebraic calculation.To the telecom companies the result of this analysis wouldimprove the level at which they can predict customer churn,because, it will give insight on why a customer wouldchoose to leave one telecommunication industry for anothertelecommunication industry. In other words, the cost ofadvertisement and loyalty programs as well as challengesface in retaining loyal customers would be identified. Theresearcher believes that the enhanced model for churnprediction developed from this study will lead to betterretention strategy, improved telecom quality service,enhanced customer loyalty due to the improvement servicefrom applying the information from this model. The studyis also significant to researchers, behavioral scientist,business analysts as well as professionals in the computerscience domain. The study will serve as a good referencematerial for research and contribute to the growingobjective of developing enhanced model built on datamining techniques that can explain the churn behavior withmore accuracy than using single methods.III. REPORTThe aggregated telecom data dataset for all variables ispresented in Table 1. These variables formed theCopyright 2020. Innovative Research Publication. All Rights Reserve7

International Journal of Innovative Research in Computer Science & Technology (IJIRCST)ISSN: 2347-5552, Volume-8, Issue-2, March .ijircst.orgbenchmark upon which the prediction models forforecasting the behavior of customers was made. The dataset containing customer information and a data setcontaining contractual information was stored each day.Table 1: Information and Characteristics of the er3SeniorCitizen4Married5Dependents67tenureVAS 194614Unique ID 704335553488590111423641340249332110Series data ial ies % data formatobjectSeries data thlyRevenueTCHAvailabilityChurnTOTAL4171The data sets are aggregated on customer ID in order toobtain the customer information data set.A. Data PreprocessingThe data preprocessing phase or cleaning involve datapreparation, data balanced and normalization. The datawas cleaned from ambiguities, errors, missing data’s, noisydata, redundancies and unique values that do not contributemuch in predictive modeling. The data was furtherprepared along the explanatory variables continuum formodelling using multicollinearity. The scatter plot wasused to reveal the probable result from the various datatypes (numeric, categorical, and binary). Data imbalancedwas handled with the use of cost sensitive classifier. Theresult of these processes is shown in Table 1, Figure 1 andFigure 2Include a note with your final paper indicating that you Thedata analysed contain 18 attributes and 7043 instances. Thefeature selection adopted in this study was the forwardselection in the Best First feature selection method as wellas matrixes that highlights features that may not affect themodel. Churn the target class had two features or binariesthat are categorical (Yes) and (No).Figure 1: the Target Class (Churn Value)The correlational matrix evaluated the attributes throughthe information gain measurement procedure as per theclass value and diversifies the selection and ranking ofattributes that significantly improves the computationalefficiency and classification. The dataset contains 18conditional features, along with one unique identifier. Theoutput of the correlation, matrix is shown in Figure 2. Thefollowing display a vivid matrix using seaborn package.From the heat map a two-dimensional graphicalrepresentation of data the individual values that arecontained in a matrix are representedCopyright 2019. Innovative Research Publication. All Rights Reserve8

Enhanced Churn Prediction in the Telecommunication Industryalso showed a visual indication that the classes are balanced.Furthermore, the categorical variables are graphicallyrepresented by bar plots. The numbers inside the barsrepresent absolute frequencies of a given category in thewhole data set. From the graph a few deductions can bemade, gender seems to have little effect on churners. VAS,POSTPre paid, Internet service, social media are possibleand high attributes with high churners.Figure.2: Correlation MatrixThecorrelationmatrixshowedthatthecorrelation coefficients between the variables. The cellsshow the value of the correlation between two variables.This summary revealed that Post Pre-Paid have a valuecloser to one than the other variables indicating that it had ahigher correlation and is more a predictor of churn.Exploratory data analysis applied before modeling in thisanalysis showed that the development of more complexpredictive models may not be necessary in the given dataset.Histograms are typically plotted for numeric variablesshowed the distribution, for categorical variables andcounts of categories. The graphical description belowshowed few overlapping density plots seen when thecomparison of different numeric variables grouped bycategorical (binary) dependent variables was used. TheTwo density plots for variables showed differences betweenchurners and non-churners distinctively.Figure 4: Attributes Interactions Source; Weka ResearchersDatasetFigure 4 shows graph with input variables. It is clearlyshown that there is a separation between classes on thescatter plots. This suggests that linear methods, decisiontrees, etc will do well on this problem. It also suggested thatadvanced modeling techniques and ensembles may not beneeded. More so, the developed enhanced predictive modelmay consist of simple classifiers.B. Data ClassificationFive classification algorithm or method was used. Therewere five classification techniques used with differentfeature selections and classifiers, which include DecisionTree, Naïve Bayes, and Decision Rules, logistic regression,random forest and Neural Network. Here the objects arecategories according to their characteristics of the objects.The data was used to apply to unseen data. In order toevaluate classifiers performance for different schemes withtheir appropriate parameters, the measures of precision,recall, accuracy and F-measure, calculated from thecontents of the confusion matrix, shown below was used.Each table shows Error rate and accuracy for each model.C. Logistic Regression ClassificationThe Logistic regression was used to describe data and toexplain the relationship between one dependent binaryvariable and one or more nominal, ordinal, interval orratio-level independent variables. The output of the logisticresult is shown in Figure 5Figure 3: Graphical Attribute Distribution Source; WekaResearchers DatasetThe graphical distribution of attributes shown in Figure 3showed minor overlap but differing distributions for each ofthe class values on each of the attributes. This is a good signthat probably shows the attributes and classes can beseparated. Notwithstanding two of the attributes has somelevel of possible Gaussian-like distribution. This shows thatmore data may likely pull the distribution towards Gaussian.Some attributes such as Monthly revenue and tenure own totheir discrete and real characters may tend towardsGaussian distributions with a skew or a large number ofobservations at the upper right end of the distribution. ThisFigure 5: Logistic Classification Model ResultCopyright 2020. Innovative Research Publication. All Rights Reserve9

International Journal of Innovative Research in Computer Science & Technology (IJIRCST)ISSN: 2347-5552, Volume-8, Issue-2, March .ijircst.orgFrom Figure 6 the kappa statistics for interrater reliabilitywas 0.4432. The figure 0.4432 represents the extent inwhich the data collected are a correct representation of thevariables measured. Giving the nature of the data source,the value of 0.4432 is adjudged reliable. The root relativesquared error was 84.3%. The classifier had an accuracy of79.8%. From the confusion matrix values, 7043 data pointswas used in the analysis, out of which 5617 were correctlyclassified and 1426 were misclassified. Combining the twometrics into a single metrics from the confusion metricsusing the TP rate and FP rate for the classifier into a singlegraph with FPR values on the abscissa and the TPR valueson the ordinate the ROC curve is plotted has shown inFigure 6.5575 were correctly classified and 1468 were misclassified.Combining the two metrics into a single metrics from theconfusion metrics using the TP rate and FP rate for theclassifier into a single graph with FPR values on theabscissa and the TPR values on the ordinate an ROC curveis plotted as shown in Figure 8.Figure 8: Support Vector Machine ROC CurveFigure 6: Logistic Regression ROC CurveUsing the metrics of AUC of the curve (AUROC) thefollowing details was observed. The area under the curve(AUROC) with Given P(score(x ) score(x )), that is theprobability that the classifier will rank a randomly chosenpositive example, higher than a randomly chosen negativeexample had a value of 0.8359 showing that AUC is closerto 1.Using the metrics of AUC of the curve (AUROC) thefollowing details was observed. The area under the curve(AUROC) with Given P(score(x ) score(x )), the valuethat the probability of the classifier will rank a randomlychosen positive example, higher than a randomly chosennegative example was 0.685 showing that AUC is midwayto 1.D. Neural Network ClassificationFigure 9(a) and Figure 9(b) shows a kappa statistics withvalue 0.4141, the root relative squared error value 82.2%,and an accuracy of 80.4%. The confusion matrix in theanalysis, was 5460 correctly classified and 1583misclassified. Using the TP rate and FP rate for theclassifier an ROC is plotted and it’s shown in Figure 10Figure 7: Model Result for Support Vector MachineInterpretation: From the Figure 7 the kappa statisticsshowed an interrater reliability value of 0.4095. The figure0.4095 is adjudged reliable based on the source of the dataas a correct representation of the variables measured. Theroot relative squared error was 103%, with accuracy of79.2%. From the confusion matrix values, 7043 data points,Copyright 2019. Innovative Research Publication. All Rights ReserveFigure 9(a): Neural Network Model Analysis10

Enhanced Churn Prediction in the Telecommunication IndustryFigure 12: Decision Tree ROC CurveFigure 9(b): Neural Network PathThe area under the curve (AUROC) had a value of 0.777showing that the AUC comes closer to 1. Hence this modelshowed a higher AUCFigure 10: Neural Network ROC CurveThe area under the curve (AUROC) showed a value of0.818 showing that the AUC comes closer to 1. Hence thismodel showed a higher AUC.E. Decision Tree ClassificationFigure 11 shows a kappa statistic with a value of 0.3972,the root relative squared error value of 91.2. %, and anaccuracy of 78.2%. The confusion matrix in the analysis,was 5508 correctly classified and 1535 misclassified. Usingthe TP rate and FP rate for the classifier a ROC is plottedand it’s shown in Figure 12. More so Figure 13 shows thedecision path and the central attributesFigure 13: Decision Tree PathF. Random Forest ClassificationFigure 14 shows a kappa statistics value 0.3941, rootrelative squared 88.6%, and model accuracy of 78%. Theconfusion matrix in the analysis was 5489 correctlyclassified and 1554 misclassified. The TP rate and FP inROC plot it’s shown in Figure 15Figure 11: Decision Tree Model AnalysisCopyright 2020. Innovative Research Publication. All Rights ReserveFigure 14: Random Forest Model Result.11

International Journal of Innovative Research in Computer Science & Technology (IJIRCST)ISSN: 2347-5552, Volume-8, Issue-2, March .ijircst.orgH. Performance of the Classification Algorithms:Algorithm EvaluationFigure 15: Random Forest Model ResultThe area under the curve (AUROC) had a value of 0.804showing that the AUC comes closer to 1.G. Summary of Classification Algorithm PerformanceMetricsTable 2 shows an extract of the confusion matrices used tocompute the performance statistics – Accuracy, Sensitivity,Specificity and F-score. Due to the imbalanced nature of thedata set it is not unexpected that accuracy is high for allmodels. The highest value of sensitivity achieved was byNeural Network, while Logistic regression had the highestfor specificity (0.900) and accuracy (79.8). Sensitivitymeasures the ability of the model to catch customers who, inreality, left the company. Results on sensitivity showed thatLogistic regression could catch 83.7 % and Neural Network85.0% of such customers. Also, F-score, which combinesHit rate and Sensitivity into one measure, was highest forlogistic regression and it could mean that less sophisticatedmodel is more suitable for this business case. Thus, themodel-based model to be compared with will be logisticregression.Table 2: Summary of Algorithm Performance reLogistic Regression79.80.8370.9000.867Decision Tree78.20.8250.8930.858Neural Network77.50.8500.8430.846Random 9120.865VectorFrom Figure 16 all of the models have skill. All the modelsperformed worse. Each model has a score that was worsethan earlier performance. Decision Tree (78.29%) wassignificantly worse than logistic regression (79.87%). Also,Random Forest (78.03%) was worse than LogisticRegression as well as Neural Network (78.37%) at 5% levelof significance. The results suggest Logistic Regression(79.87%) is better than neural network, SVM, Randomforest. We can certainly infer that at 5% level ofsignificance logistic regression can predict churn to anaccuracy of 79.87%. This decision is predicated on the factthat aside having high values both logistic regression andSVM are much simpler model. To this end the LogisticRegression is selected as the model to be enhanced.Therefore, we used the Logistic Regression results as thetest base for other models.Figure 16: Comparisons of the Performance of theClassification AlgorithmsIn addition to the use of accuracy, Sensitivity, andspecificity are also used to quantify the accuracy of thepredictive models. The True Positives (TP), False Positives(FP), True Negatives (TN) and False Negatives (FN) are theTP, FP, TN and FN in the confusion matrix. To access thespecificity, ROC analysis, the ROC curve in the equationsx 1 specificity (t) and y sensitivity (t) is shown in Figure17Figure 17: Comparisons of the Performance of ClassifiersImplementation of Churn Management SystemCopyright 2019. Innovative Research Publication. All Rights Reserve12

Enhanced Churn Prediction in the Telecommunication IndustryIn this section the result of the trained logistic regressionalgorithm on unseen data is presented and the performancemetric is shown in figure 18 and 19.Figure 20: Enhanced Model ImplementationFigure 18: Model DescriptionUsing some numbers and parameters the model was furtherdescribe in relation to performance of the model on unseendata. From the study the estimated accuracy of the model onunseen data was 79.87% with a standard deviation of1.49%.Wherey Predicted Outputb0 bias or intercept termb1 coefficient for the single input value (x).x input (eg gender or VAS, or etc)That isy exp(-0.2474 b1*x) / (1 EXP(-0.2474 b1*x)y -0.2474 0.0205x1–0.312x2 e Model EquationIV.DISCUSSION OF FINDINGSFigure 19: Enhanced Model ResultBy running classification on the data using Logisticregression in WEKA we have the factors that influenceschurn. The model implemented and the parameters used isshown in Figure 19 and Figure 20. This mode and theseparameters form the target variables, management need tofocus on. This is used along with Logistic equation y e (b0 b1*x) / (1 e (b0 b1*x)Churn Data are usually noisy and imbalanced, and multipleclassifiers have their own limitations. So, the studyemployed an enhanced model design by using the Weka bigdata platform that allows for mining, processing andvisualization to achieve higher AUC. The algorithms,logistic regression, neural network, Support vector machine,decision tree and random forest were analyzed in the studyand it resulted in a high AUC. This means that, comparedto random prediction, it is beneficial for telecom providersto implement one of the approaches from this study.Nevertheless, given the value from the enhanced logisticregression model in terms of performance statistics –Accuracy, Sensitivity, Specificity. The Logistic regressionmodel better predict churn. More so, the result showed thatinternet service, types of contract entered, internet securitywere major factors that influence churn. The study usedvarious methods of classifiers like earlier researchers [6] [7][11] who used various methods (Artificial Neural Networksand Decision Trees). The findings from this study showedthat the method used were all effective and can be equallystrong to predict churn. In terms of variables that causeschurn the findings of this study agree with [15] in that manyof the variables have correlations with churn and affects it,however, internet services and types of contract affect churnthe more. In studies where classifications were not carriedout like the study by [7] that adopted a workingmethodology of Ensemble based Classifiers such asbagging, boosting and random forest, in contrast analysis toCopyright 2020. Innovative Research Publication. All Rights Reserve13

International Journal of Innovative Research in Computer Science & Technology (IJIRCST)ISSN: 2347-5552, Volume-8, Issue-2, March .ijircst.orgcommon classifiers such as; Decision Tree, Naïve BayesClassifier and Support Vector Machine. The studyconcluded that effectiveness is best with simple classifierslike SVM and logistic regression but the result from logisticregression showed that it was the best Classifier for theChurn Prediction Problem as compared to other models.Furthermore, this study corroborates [13] study thatadopted a predictive models and performance metrics andshowed that the various churn prediction methods are ofefficient performances. Also, from a qualitative approachthe study correlates the findings of [9] that, thedeterminants of customer churn were varied and firms needto put up strategies to maintaining competitive positionwithin the industry. Furthermore, experimental resultsconfirm that the prediction performance has beensignificantly improved by using a large volume of trainingdata, a large variety of features. In [18] study the quality ofservice is highly significant in tandem with, customersatisfaction, possession of superior technology, and cost ofchange and advertising.Difference between this findings and other like [16] [5] [16][21] [20] on logistic regression may be attributable to therigor on data selection and cleaning as well as the numberof classification employed. The study of [23] that adoptedthe use of data mining tools to select and classify featureswithin the selected customer churn dataset pointed out thatLogistic Model is the best method due to its accuracy usingneutral network, while this however, do not reflect theoutcome of this study and differs from [32] that studyshowed that the different data mining techniques improvethe prediction accuracy

This involves the use of data analytics system that uses real-time integration and dynamic real time responses data to detect churn risks. Subscribe are increasingly . objective of developing enhanced model built on data mining techniques that can explain the churn behavior with more accuracy than using single methods. KEYWORDS-Prediction .