New Methods Of Customer Segmentation And Individual Credit Evaluation .

Transcription

Advances in Economics, Business and Management Research, volume 131“New Silk Road: Business Cooperation and Prospective of Economic Development” (NSRBCPED 2019)New Methods of Customer Segmentation andIndividual Credit Evaluation Based on MachineLearningZhou YupingWuhan University of Technology Wuhan University ofTechnologyWuhan 430070, Peoples R China842118147@qq.comChen GuanyuWuhan University of Technology Wuhan University ofTechnologyWuhan 430070, Peoples R China771579266@qq.comPetra JílkováMasaryk Institute of Advanced Studies Czech TechnicalUniversity in PraguePrague, Czech Republicpetra.jilkova@cvut.czDavid WeislMasaryk Institute of Advanced Studies Czech TechnicalUniversity in PraguePrague, Czech Republicdavid.weisl@cvut.czAbstract—The internet has enabled a fundamental changein consumer behaviour and their understanding of e-commercebusiness. The main objective of the following article is topresent the latest trends in the way of client segmentationassociated with individual credit evaluation based on machinelearning. The first part discusses the current situation andinnovations in the way people pay in an omnichannel world.We describe how the absence of physical money has affectedsociety, how it has changed customer purchasing behaviour,and what this change means for the digital economy andmarketing. In the background of the rapid development of bigdata and the Internet technology, the traditional personalcredit evaluation method of the commercial bank faces asignificant challenge in the evaluation of personal credit. Basedon the limitation of the existing personal credit evaluationmethod, the second part discusses the necessity of the researchon the personal credit evaluation based on the machinelearning method and then probes into the comprehensivepersonal credit evaluation dimension and the advanced dataacquisition method of the Internet finance company. And then,the data desensitisation and LOF test were carried out bydynamic desensitisation technique. The abnormal value of thetested data and the random forest method supplement themissing value of the data. The importance index is screened bythe gradient boosting decision tree method, and the personalcredit evaluation score is output through the scorecard modelbased on logical regression. After that, the model is tested byBP neural network, and the personal credit level is predicted.The personal credit level fosters customer marketsegmentation.Keywords: customer segmentation, digital payments, BPneural network, machine learning, personal credit scoreI. INTRODUCTIONThe digital revolution has changed the business modeland brings enormous digital opportunities and challenges intoday's rapidly changing environment. Data are the neweconomic resource for creating economic value, brings addedstrength and competitive advantage. Over the past severaldecades, the communication technology sector and paymentsystems have undergone rapid development. There is anenormous shift from standard economy to the birth of thecomputer in the 60s, the introduction of the internet in the90s, and most recently, e-commerce, fintech, ArtificialIntelligence (AI) and robotics or driverless cars. Newtechnologies, especially AI, thoughtfully contribute to theobjectives of sustainable development and lead to asignificant shift in the labour market.The digital economy requires several new skills andcreates new risks, from cybersecurity breaches to facilitatingillegal economic activities. Digital platforms are becomingincreasingly important in the world economy, and some haveachieved a strong market position in certain areas. Accordingto Jirinova and Scholleova (2016), innovation activities incompanies are an essential prerequisite for a competitiveadvantage and long-term existence of the company. Thecompetitive advantage of the whole economy certainlydepends on the competitiveness of companies. [1] Forexample, Google, Facebook and Amazon in Europe. InChina, WeChat with Alipay (Alibaba) and their paymentsolutions, captured the entire Chinese mobile payments.Today, thanks to mobile payments, China is a society thatoperates almost without the physical form of money.Payments have already evolved to the point that the clientcan only pay by scanning the face and insert the mobilenumber to verify the identity. Therefore, a wealth of personalinformation is collected about all payers, which can befurther analysed. This is illustrated by the effort to createincreasingly sophisticated individual credit models.The main objective of the article is to present the latesttrends in the way of client segmentation associated withindividual credit evaluation based on machine learning andin the context of the omnichannel world. First, we discussedat the current situation in and innovations in the way peoplepay in an omnichannel world. We describe how the absenceof physical money has affected society, how it has changedcustomer purchasing behaviour, and what this change meansfor the digital economy and marketing. Further parts explainthe relation between new insights connection with clientsegmentation, clarify the individual credit evaluation andspecify Individual Credit Base Model to improve theCopyright 2020 The Authors. Published by Atlantis Press SARL.This is an open access article distributed under the CC BY-NC 4.0 license 5

Advances in Economics, Business and Management Research, volume 131accuracy of credit scoring. In the last part, the theoretical andpractical conclusions are summarised.II. LITERATURE REVIEW AND RESEARCH METHODSA. New Methods of Customer SegmentationThe use of Internet services is growing worldwide.Shopping habits of customers in the last decade havechanged. Using digital technology has changed thetraditional concept of buying. Customers are increasinglyusing online content. They search social networks,communicate online, actively use the Internet to get news,video calls, shop online, use online banking services, andmore. As a result, retail brands use new strategies to respondto customer needs, provide innovation, and improve theshopping experience. This new omnichannel behaviour is nolonger limited to specific segments and technology; it isbecoming widespread and mainstream across thegenerations. The most common tools and strategies are aresponsive web page customised for each device,personalised offers, content strategy and proactivity infinding the bloggers and influencers. Alibaba is estimated tohave nearly 6.72% of China's e-commerce market.In recent years, there was an improvement in the digitalposition of all EU countries. The most significant progressrecorded countries such as Finland, Denmark, theNetherlands, and Sweden, which are among the worldleaders in digitisation. These countries are followed by theUnited Kingdom, Luxembourg, Ireland, Estonia andBelgium. However, the EU needs to improve in order tocompete globally and to offer payment services such asChina or the USA. [2]In recent years, the scale of China's digital economy hasbeen overgrowing, accounting for a continuing increase inthe share of GDP. China's entire digital economy reached31.3 trillion yuan in 2018, accounting for more than 1/3 ofGDP. By 2019, the overall size of China's digital economymay be close to 36 trillion yuan. China's digital technology,products and services are accelerating the integration ofpenetration into all walks of life, and the growth andefficiency of other industrial output have been enhanced. [3]Fintech is an emerging financial services sector thatincludes third-party payment, innovative financial services orproducts delivered via new technology. Global companiesare interested in developing new technologies for morefavourable settlement and payment systems and expandingtheir services internationally. All fintech innovations must befollowing CSR of these financial and non-financialcompanies. According to Dvořáková and Quigley [4], CSR isa strategy whereby the corporations take responsibility forthe economic, social and environmental consequences oftheir business activities.Giants such as Apple and Google are also moving intothe mobile payment market with their payment platforms,Apple Pay and Google Pay. In the east, the internet industrygiants like Alibaba and Tencent have become providers ofbanking services for banks without branches, such as AntFinancial and WeBank (Alibaba’s online bank). Thesetechnologies have improved the quality of financial servicesand provided more extensive access to banking and financialservices. China has become a global leader in fintech ecommerce industry. In recent years, traditional Chinesefinancial system has been transformed into a top financialsystem where Alibaba is now one of the largest financialcompanies in the world. Third-party payment system (TPP),a fact which allows the development of electronic commerce.The most significant number of internet users (97%) inEurope is young individuals from 16 to 24-year-olds andpeople who have a high level of formal education. In 2018,83% of Europeans used the internet at least weekly, and 76%almost daily or daily in comparison with 81% and 72%respectively a year earlier. [2]Profitable business creates strategies which usually startwith sophisticated customer segmentation, which is a set ofconcepts and models that lead to profitable product offer.According to Kotler (2002) [5], e-commerce can besegmented into the following four categories: B2C (businessto consumer); B2B (business to business); C2B (consumer tobusiness), and C2C (consumer to consumer - consumersparticipate in the transaction), and the main variables forsegmenting consumer markets are demographic, geographic,psychographic, and behavioural variables. Business-toconsumer (B2C) is a term in electronic commerce in whichonline transactions are made among businesses, andindividual consumers and companies can sell their productsdirectly to the consumers [6]. B2B model is a concept inwhich the participants in electronic commerce are often largecompanies. C2B is a segmentation model in which thecustomer contributes to the business. However, these arecommonly used methods of grouping customers. Lu, and Wu[7] came up with a segmentation method based oncustomers’ transaction patterns. According to Karimi-Majd,and Fathian [8], we can use data mining techniques toanalyse customers’ data and purchases in order to developproduct offer. Smeureanu, Ruxanda, and Badea [9] definedcustomer segmentation in the private banking sector usingmachine learning techniques. Based on their paper, theCustomer segmentation in the private banking sector is anessential step for successful business development, enablingfinancial institutions to address their products and services tohomogeneous classes of customers. The paper approachestwo of the most popular machine learning techniques, NeuralNetworks and Support Vector Machines, and describes howeach of these perform in a segmentation process.Retailers are increasingly converting their services todigital channels as they provide clear benefits for bothretailers and customers. For customers, digital channelsenable faster and more convenient services, and morebeneficial off ers, leading to a more informed decisionmaking process. The recent research has focused onunderstanding customer behaviour and preferences in amultichannel system. As a result, a multi-channel retailer isincreasingly crucial in offering products to customers.Schoenbachler, and Gordon [10] examined the decisionmaking process and the implementation of a multi-channelstrategy. It could be simplified if companies understand whatleads users to a single channel, multiple channels, and whichchannels are preferred. The evolution of multi-channelthinking is omnichannel retailing. This approach considers abroader perspective on customer segmentation and howcustomers are influenced and move through the buyingprocess. [11]. Following their example, Sands, Ferraro,Campbell, and Pallant [12] segmented omnichannelcustomers in the phase search, purchase and after-sales.Nakano, and Kondo [13] grouped customers according tochannel preference and further classified these segments into926

Advances in Economics, Business and Management Research, volume 131seven subgroups based on their media usage. Thephenomenon of customer way model based on onlinemarketing strategies and customer s interaction with societyis STDC matrix [14]. Many businesses make mistakes andevaluate online marketing activities only from theperspective of the so-called “Last Click”. According to [14],STDC framework consists of 4 main elements - See, Think,Do and Care. Based on this theory, we can define fourcustomer’s segments. See customers, people who are notlooking for a solution to any problem; they go through theweb pages. The goal of the marketing strategy is to engagethe audience with funny, entertaining and useful content. Wesupport good customer relationship with the brand. TheThink phase involves customers who have a specific problemand want to solve it. The customer usually looks for specificinformation, product or service. In the Do phase, there is agroup of customers who make a purchase decision andchoose a specific product. They are interested in specificinformation, specifications of specific products, brands,types, durability, parameters, and purchasing conditions. TheCare phase includes customers who have purchased morethan twice or who buy products and serviced regularly.B. Individual Credit Evaluation SystemThe development of mobile payment platforms hasrevolutionised the analysis of customer behaviour. Byinterconnecting communication and payment platforms, it ispossible to analyse interpersonal relationships in addition topurchasing behaviour. It made it possible to refine thesegmentation of customer target groups significantly and atthe same time, to better identify the individual buyer andknowing his or her needs. Because business activities areoften associated with a lack of fund on the buyer's side. Inaddition to marketing purposes, it was a logical step to linkthe payment platform to the online lending environment.Due to a large amount of information on individual users,their habits, groups of friends, it was possible to startapproving loans virtually online through mobile applications.Below in this section, a model calls SESAME CREDIT,which evaluates all users of China's most successful paymentplatform ALIBABA, is described and presented in detail.TABLE I.SESAME CREDIT SCORING DIMENSIONSIdentity trait5%Performancecapability20 %Userhistorycredit 20 %Interpersonalconnections15 %Behaviouralperformance25 %a.Basic information Age, gender, occupation,indicatorsfamily situation, etc.Payment and fund Number of credit cards,indicatorsaverage amount of eachclientBlacklistRecord for any cheatinginformationtransaction fraud behaviourindicatorsInterpersonalFriends credit ratingconnectionscontact activity number esfinancing, paymentindicatorsSource: Own Processing Based on Global Lighting Network, 2019Below it is written the comparison of Individual CreditEvaluation System parameters and Commercial Bank RatingIndex System parameters. Similarities are in scoring identity,compliance factors and credit history evaluation.1) China Construction Bank: Gender, marriage, health, industry status, title; the account in China construction bank; balance of savings account; Business dealings with China construction bank.2) China Citic Bank: Gender, marriage, industry status, position, workingtime in the unit, title; Other assets, insurance, loans.3) China Minsheng Banking Corp.Ltd: Financial assets, other assets, insurance, accounts inMinsheng bank, whether there are bad credit records,card consumption points.For more details, see Table. II. Comparison of CreditEvaluation Dimensions (China and Czech Republic, 2019)TABLE User eferencesCOMPARISON OF CREDIT EVALUATIONDIMENSIONSCentral Group)ChinaCommercialBankCzechXSource: Own Processing Based on Baidu Wenku, 2019Table II compares not only the Credit Evaluation Systemin the Czech Republic and China but also the CreditEvaluation System of traditional banks and online mobilepayment platforms. The table shows that the credit systemsused by banks in both countries are practically identical. Atfirst glance, it is apparent that significant differences canonly be observed with online mobile paymentplatforms. This is due to the technical capabilities of onlinemobile payment platforms, which can collect and furtheranalyse users' data.How the data is used is described in the followingchapter, which describes in more detail how the personalcredit is compiled.C. Method of Machine Learning Data AnalysisBefore constructing the personal credit evaluation model,it is necessary to collect and process the data. There are fourmain channels for collecting personal credit evaluation data.Based on the rich data collected, the collected data areprocessed and sorted out, such as desensitisation,replenishing missing value and so on, in preparation forfurther modelling.1) Data acquisition channelThe personal credit data collected by Internet financecompanies are extensive, and the sources of credit data arediversified. [15] There are four kinds of data: necessary927

Advances in Economics, Business and Management Research, volume 131information, Internet data, e-commerce data and publicinformation. The necessary information comes from the dataprovided by the user, such as the user registrationinformation. The Internet data of the Internet is derived fromInternet finance companies such as payment, microcredit and other data. The e-commerce data is derived from the e-commerceplatform of the internet finance company, Such asuser trading goods, transaction amount and othertransaction service data. Public information comes from data provided byexternal institutions with partnerships, such as theLearning Information Network, ID card systems, etc.2) Data processingThe data collected by the Internet Finance Corporationfor personal credit evaluation are so large that there areproblems such as data loss and data exception. At the sametime, attention should be paid to the protection of customerprivacy and the desensitisation of sensitive information.Besides, in order to ensure the accuracy of the model, it isnecessary to screen the collected variables and select thevariables that have a significant influence on the final results. Datadesensitizationdesensitization technologybasedondynamicThe sensitivity data of the credit body sample isdesensitised by the dynamic desensitisation technology, andthe purpose of encrypting and hiding the sensitive data in thebackground database can be achieved, and the sensitivityinformation of the client can be protected.For the sample data after deleting the outliers, themissing values are supplemented by random forest method.K sample data sets were randomly selected, and d indexeswere randomly selected to construct the training set. For data with the same category in the training set,use the force-filling method to supplement themissing value. The random forest model is trained by the filledtraining set, and the similarity matrix is counted.By analyzing the similarity degree between all theelements of the two matrices, the filling results ofthe missing values are determined. The iterations are carried out 4 - 6 times, and thefinal missing value processing results are output. Screening indexes based on gradient boostingdecision tree methodAfter the data loss value is supplemented by the randomforest method, it is necessary to filter the indicators that havean important influence on the individual credit evaluationresults to increase the accuracy of the personal creditevaluation. [16] [17] [18]In the input model of the sample data set of each index ofthe credit subject, the root node selects the feature of theminimum Gini index before and after the split as the featurevariable, assuming that the credit card is repaid on time.Subnodes repeat the same split mode, select sub - importantfeature variables in turn, and build a weak learner 1. Themodel error is minimised by iteratively building the finalstrong learner. Data desensitisation mainly includes four aspects: For identity data, such as email, mobile phonenumber, to identify. For time and date data, remove its accuracy. Convert text data, such as name, home address,etc, into digital data. Logarithmic data, such as amount, pen, etc,positive and negative floating 5% -10%. Detection of abnormal value based on the LOFtest methodThe desensitised data need to be detected by LOFalgorithm, and the same index of different sample data isregarded as the same cluster, which is divided into n clusters(n is the index number of evaluation dimension). Theaverage of the ratio of the local reachable density of theneighborhood point Nk(p) of a certain point p of the samecluster to the local reachable density of the point p iscalculated by the LOF algorithm. If the ratio is less than 1(more than 1), the description is the average point (abnormalpoint) After the exception value is identified, the exceptionvalue can be deleted and regarded as a missing value, and themissing value can be processed by random forest method. Improve data saturation based on the random forestmethodFig. 1. Gradient boosting decision tree model (Source: Own ProcessingBased on Intelligent Algorithm, 2019)According to the established strong learner, the averagevalue of the importance of feature variables in all weaklearners is taken as the importance of feature variables in themodel.III. RESULTSA. Construct the scoring card model based on logicalregression, and output the personal credit scoreIn the process of credit evaluation of credit subjects,banks-based financial institutions pay more attention to theuse of transparent models (such as logical regression modeland scorecard model) to evaluate the credit subject, so thatthe process of variable input to score output is transparent.[19] [20]928

Advances in Economics, Business and Management Research, volume 1311) The probability of default is obtained by the logicalregression modelLogical regression method is a standard method inmachine learning. Through the study of the characteristicvariables of the credit subject, the essential factors affectingthe personal credit score are explored to predict theprobability of default of the credit subject.In the process of generating a personal credit score, afterselecting the pre- ( n ) indexes which have an essentialinfluence on the production of personal credit score areselected by gradient boosting decision tree model, the creditlogical regression model should obtain risk probability ofcredit subject.11 e g ( x )The probability of non-default by the credit subject:p ( y 0 x) Ofx1 , x2 ,which,11 eg ( x) x Score A B( 0 1 x1 l xl )( ) Output scoring resultsx ,x ,,x are the variables that have great influences onthe credit score of the credit subject selected by the gradientboosting decision tree. 1 , 2 ,Therefore, the formula (3) can be written as follows:x evaluation is established, the input variable 1 2needs to be WOE -coded to reflect the effect of each( ) characteristic variable (the loan amount, the income level,the blacklist information, the credit card number, etc.) on thedefault probability of the credit principal. Therefore, theform of personal credit can be further written as follows:( )g ( x) 0 1 x1 log (Odds) log (1 p) 0 1 x1 l xl ( )When the credit card model of the individual creditThe probability of default by the credit subject:p ( y 1 x) In the logical regression model, log(Odds) can beconverted into a formula based on the linear combination ofcharacteristic variables of personal credit evaluation, asshown in formula (4): are the coefficients in the logicalregression equation. y 1 or 0 indicates that the creditprincipal default occurred or did not occur, respectively.Score ( A B 0 ) ( B 1w11 ) 11 ( B l wl1 ) l1 ( )Of which, wij is the pre-i variable number j line WOE(hypothetical i variables are loan amounts, which can bedivided into j paragraph. i is the coefficient in the logicalregression equation. ijindicates whether the variable itakes the value of row i. A B 0 is the basic score of thefinal credit score for the credit subject.TABLE III.SCORE CARD SCORE CALCULATIONS2) Scorecard model outputs final credit scoreAfter establishing the logical regression model, it isnecessary to use the scorecard model to convert theprobability of credit subject default into a score, which ishelpful for financial institutions to quantify and managecredit risk better, to make credit decisions moreobjectively. Scorecard modelThis paper investigates a credit scoring systemresearch that combined a new perspective of clientsegmentation based on bank credit ratings known inEurope and the experience of new e-commerce platformsoperating in China.c.Source: Own Processing Based on Talking Data, 2019 Scorecard modelIn the scoring card model, the personal credit score isexpressed by the linear function of the ratio of the defaultprobability of the credit subject to the non-defaultprobability Odds , and the basic expression of the personalcredit score can be obtained, as shown in formula (3):B. Constructing BP Neural Network of Individual CreditBase Model to Improve the Accuracy of Credit ScoringBP neural network is better at dealing with nonlinearcomplex classification problems. Its input layer and hiddenlayer are a process of feature extraction, while the outputlayer outputs the final results after processing the variablesScore A B * log Odds( ) after feature extraction. The personal credit evaluation indexis extracted through the input layer and the hidden layer, andafter dimension reduction, it enters the output layer andConstant A is the compensation score, which is part offinally outputs the personal credit score.the basic score of personal credit score, and constant B iscalled scorecard scale.()929

Advances in Economics, Business and Management Research, volume 1311) The design of the number of nodes in BP neuralnetworkThe BP neural network model of personal creditevaluation has three layers’ structure, including input layer,hidden layer and output layer. The determination of thenumber of nodes in different layers will have a great impacton the output of personal credit score.the output result of the credit principal credit score with theoriginal credit score of the main credit body, and checkingthe accuracy of the model, and when the accuracy of themodel reaches the present range, the residual sample can beinput, and the model is further tested.The number of input layer nodes is determined bythe number of pre- n important indexesselected by the gradient boosting decision tree, that is, theinput layer has neurons.The remaining M - m samples are taken as test samples,and the sample data are inputted into the trained personalcredit evaluation BP neural network model in the same way.The credit score of the test subject is outputted, the creditscore of the output credit score is compared with the actualcredit score of the credit subject, and the accuracy of theevaluation model is evaluated. When the accuracy of themodel reaches a realistic and acceptable degree, the personalcredit evaluation model is put into use. For the new sampledata of the evaluation, the credit score of the credit subject isfinally obtained by inputting the new sample data into themodel.()The number of neurons in the hidden layer affects theaccuracy of the output results. The initial range of thenumber of neurons in the hidden layer is determined bythe number of neurons in the input layer and the outputlayer, and then the number of hidden layer nodes isdetermined by the cut-and-trial method.The input layer neuron is determined by the outputresult, and the output result is a personal credit score, sothe output node is 1.2) Parameter setting of BP neural network for personalcredit evaluationThere are many influencing factors on the establishmentof BP neural network model for personal credit evaluation ,such as the number of training times of the model, erroraccuracy and other parameters, which will have a greatimpact on the output of personal credit score. Therefore, it isnecessary to set the model parameters according to the actualsituation.Training times: in the process of BP neural networktraining, the information is iterated continuously, and thenumber of iterations is the training times of BP neuralnetwork model. For the large sample data of personal creditevaluation, the number of training times is more, which is50000 times.Error accuracy: in the training process of BP neuralnetwork, the weight of the network is adjusted by continuousbackpropagation of error, when the training error of themodel reaches the maximum allowable error e, the training isstopped, and the credit maximum credit score is output.Transfer function: in the BP neural network of personalcredit evaluation, the general selection function Sigmoid isused as the transfer function of the hidden layer and theoutput layer to process the sample data of personal creditevaluation.Learning function: in the BP neural network model ofpersonal credit evaluation, the learning function selects thegradient descending momentum learning function and thenadjusts the weight and threshold of the neural network.3) Output personal credit score results Training of BP neural network modelAfter completing the basic construction of BP neuralnetwork model of personal credit evaluation, the model isstudied and trained by inputting the s

customer segmentation in the private banking sector using machine learning techniques. Based on their paper, the Customer segmentation in the private banking sector is an essential step for successful business development, enabling financial institutions to address their products and services to homogeneous classes of customers.