Short‐term Railway Passenger Demand Forecast Using Improved Wasserstein .

Transcription

Received: 26 September 2020Revised: 7 January 2021IET Intelligent Transport SystemsAccepted: 20 January 2021DOI: 10.1049/itr2.12036ORIGINAL RESEARCH PAPERShort-term railway passenger demand forecast using improvedWasserstein generative adversarial nets and web search termsFenling Feng1,2Jiaqi Zhang11School of Traffic and Transportation Engineering,Central South University, No.22 Shaoshan SouthRoad Tianxin District, Changsha, Hunan 410075,China2Key laboratory of Traffic Safety on Track, Ministryof Education, No.22 Shaoshan South Road TianxinDistrict, Changsha, Hunan 410075, ChinaCorrespondenceChengguang Liu, School of Traffic and Transportation Engineering, Central South University, No. 22Shaoshan South Road Tianxin District, Changsha,Hunan 410075, China.Email: liuchengguang@csu.edu.cnFunding informationThe National Social Science Foundation of China(Grant: 18BJY169)The Ministry of Science and Technology of the People’s Republic of China (Grant: 2018YFB1201402)The Fundamental Research Funds for the Central Universities of Central South University(2018zzts167)1Chengguang Liu1Wan Li1Qiwei Jiang1AbstractAccurately predicting railway passenger demand is conducive for managers to quicklyadjust strategies. It is time-consuming and expensive to collect large-scale traffic data. Withthe digitization of railway tickets, a large amount of user data has been accumulated. Wepropose a method to predict railway passenger demand using web search terms data. Inorder to improve the prediction accuracy, we improved Wasserstein Generative Adversarial Nets (WGAN), which were good at generating and identifying data, by adding a predictor and supervised learning adversarial training to predict railway passenger demand. Theimproved WGAN could generate virtual data to expand real data, and use parallel data topredict railway passenger demand. We used search times of web search terms on differentdevices as training data to predict railway passenger demand in Beijing. The results showthat the change in demand for railway passenger lags behind the change in the data of websearch terms by one month. It is suitable for forecasting in advance. Compared with otherforecasting methods, the improved WGAN performance is better, and the mean absolutepercentage error is 1.98%. Because it can use mixed data for training and prediction, it hasstronger adaptability when data scale decreases.INTRODUCTIONThe railway plays an important role in transportation systems.Figure 1 shows the trend of railway passenger volume in China.It shows a growth trend in the long term. But in the shortterm, there is great fluctuation. Generally, January, February,July and August of each year are the peak months of passenger flow. Railway passenger demand prediction analysespassenger demand development dynamically and carries outquantitative calculations based on qualitative analysis. Thecorrect prediction for railway passenger demand has a significant effect on economic development, resource allocation,investment structure, and management of railway enterprises,and it also provides scientific basis for programming passenger transportation projects. Because of the vital role inthe basic functions of resources management, it is very significant to predict the volume of railway passenger demandaccurately [1].According to different time periods, the demand forecastingcan be divided into long-term and short-term. Long-termdemand forecasting is helpful for large passenger transportation project planning, infrastructure construction planning,and long-term investment decisions. Adjustments to trainoperation plans, passenger flow organization, and resourceallocation are more sensitive to short-term passenger demandchanges. In recent years, China railway scale has grown rapidly.However, the scale expansion has made operational problemsmore complicated. Financial costs and fixed asset depreciationchallenge railway profitability. Formulating service strategiesbased on passenger flow can improve service capabilities ofrailway operators while avoiding wastage of resources. It cannot only improve service level, but also help to save operatingcosts. Therefore, railway passenger demand forecasting hasattracted increasing attention.So far there has been some researches on the predictionmethods of passenger demand based on different perspectives,This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work isproperly cited. 2021 The Authors. IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology432wileyonlinelibrary.com/iet-itsIET Intell. Transp. Syst. 2021;15:432–445.

FENG ET AL.FIGURE 1433The trend of railway passenger volume in Chinaand most of them are quantitative methods, consisting of timeseries model, neural network, support vectors machine (SVM)and so on. However, it is difficult to accurately forecast passenger demand and irregular fluctuations based on historicaldata, due to the volume of railway travel being affected by manyfactors. Therefore, the irregular factors become the key factorsto be predicted. The forecasting of railway passenger demandremains a significant problem because of the following reasons.1. Railway passenger flow shows obvious non-linearity, randomness and volatility. Besides conventional factors (legalholidays etc.), it may also be affected by unexpected factors, such as emergencies. For example, from January toMarch 2020, the demand for railway passengers decreasedby more than 50% year-on-year due to the COVID-19 outbreak. Forecasting methods based on historical data trendare difficult to accurately predict railway passenger demandand the irregular fluctuation.2. Machine learning methods require a large number of datasets for training and testing to improve prediction accuracy.However, in practice, it is time-consuming and expensive tocollect large-scale traffic data [2]. Data missing or corrupteddegrades its quality. There are research needs for more convenient data acquisition methods, as well as time-sensitivedata.With the rapid development of Internet technology, moreand more people use it to make a reservation, search for information and so on. The Internet has become an important partof most people’s lives. According to the Digital in 2019 Q2Global Overview published by We Are Social and Hootsuite, bythe end of April of 2019, the number of Internet users all overthe world had reached 4.4 billion, and the total population ofthe world was only 7.5 billion. Besides, based on the 44th ChinaInternet Development Statistics Report, by the end of June of2019, the number of Internet users in China had reached 854million, which accounted for 61.2% of China’s population. Lastbut not least, there were 694 million Internet users using searchengines and the popularity rate reached 81.3%. In the developedcities of China, the popularity rate of Internet access was higher,it could reach about 80% in Beijing. Internet users will generatea lot of data when they go online. These data can reflect notonly online behaviours, but also personal habits, hobbies anddemands of users. Online data are useful for predictors [3]. Ifthese data can be reasonably analysed and utilized, it will produce great benefits.To promote the study of web search behaviour, Baidu Company and Google Company have launched the Baidu index andthe Google trends, respectively. They provide search volume ofa specific search term which brings convenience to scholars.Baidu has the biggest market share in China, nearly 90%. Thesearch engine data has been applied to prediction of tourist volume [4], flu trend [5], and public transportation arrivals [6] andso on. Web search engine data can improve performance forpassenger demand prediction [7].According to the latest China Railway statistics, with thecompletion of the reform of electronic railway ticket and theimprovement of online ticket purchasing channels in China,more than 88.4% of people choose to use the Internet to bookrailway tickets. The first step to book railway tickets online isto search for relevant information via the Internet. Passengersoften use the web search engine to obtain related informationbefore travel, such as ticket booking information, departure,destination, railway station etc. The website search engine notonly provides information for the passengers, but also recordsquery process and reflects travel willingness of passengers.Therefore, the fluctuation of railway passenger demand can bereflected by the web search data and it is feasible and reasonableto predict the railway passenger demand based on web searchdata. But not all search terms are relevant to railway passengerdemands. This paper proposes a method to select relevant

FENG ET AL.434search terms for the railway passenger demand forecast fromthe search engine database. We analyse the relationship betweenweb search terms data and railway passenger demand, and tryto use search times data of web search terms to predict thenumber of railway passenger demand. The paper providesa more effective and accurate method for railway passengerdemand forecasting in the context of electronic ticketing.Although web search data can reflect the trend of these fluctuations, there is a complex non-linear relationship betweenthose data and railway passenger demand, which requires amodel with preferable fitting ability. In recent years, the neuralnetwork has been widely used because of its strong fitting ability. For example, the most common back propagation (BP) neural network is used for predicting, classification. Since 2012, thedeep learning has spread rapidly all over the world. In 2014, theappearance of generative adversarial nets (GAN) made modelimaginative [8]. The GAN training strategy is to define a gamebetween two competing networks, generator network and discriminator network. The generator network maps a source ofnoise to the input space. The discriminator network receiveseither a generated sample or a true data sample and must distinguish between these two. The generator is trained to foolthe discriminator. When the discriminator cannot determinewhether the data comes from the real dataset or the generator,the optimal state is reached. At this point, we obtain a generator model, which has learned the distribution of real data. Themodel can not only fit the existing data better, but also generatenew data based on the existing data. The generated data has thesame distribution as the real data and can continue to train themodel in turn. The data generated by GAN and real data havebeen applied to traffic flow imputation [2]. A collapse problemmay occur in the learning process of GAN, and the generatorbegins to degenerate. It always generates the same sample pointsand becomes unable to continue learning. When the generation model collapses, the discriminator will also point the similar sample points to similar directions, and the training cannotcontinue. Wasserstein GAN (WGAN) [9] efficiently addressedthese problems. However, the weights of WGAN’s focused ontwo extreme values, which will affect the gradient of the generator, and the weight clipping strategy led to gradient disappearance or explosion. Gradient penalty (WGAN-GP) does notsuffer from the same problems [10]. Most successful examplesof WGAN focus on generating data especially in images andtexts, but it cannot be directly used for passenger demand prediction. In this paper, we used the parallel data, namely the generated data and the real data, to augment the original data tohelp improve the railway passenger demand prediction models.To forecast the railway passenger demand, we formulated a special loss function and adversarial training.Based on the analysis above, this paper attempts to use websearch terms and improved WGAN-GP to predict monthly railway passenger demand. There are mainly three contributions inthis paper.1. We have proposed methods and analysis framework forselecting web search term data related to railway passengerdemand, and have analysed its relevance to railway passen-ger demand. Web search terms are suitable for forecastingbecause it is easy to collect, time-sensitive, and can reflectfluctuations in railway passenger demand. It enriches the datasources of the railway passenger demand forecast.2. The improved WGAN-GP is proposed to predict the railwaypassenger demand. As a generative model, the most directapplication of GANs is data generation. We used it not onlyfor generating data, but also for forecasting railway passenger demand. The typical GAN consists of a generator and adiscriminator. We added a predictor to the model. The special loss function and adversarial training under supervisedlearning were designed to predict railway passenger demand.In the numerical experiment, the proposed model outperformed other approaches.3. We used three datasets from different sources for numericalexperiments. The prediction performance of the models isthe best when all devices data was used. We used the generated data and the real data to augment the original data totrain the model in turn. We analysed the impact of data setreduction on the prediction performance of the improvedWGAN-GP. The improved WGAN-GP has stronger adaptability when the data scale decreases.The remainder of the paper is organized as follows. Section 2 reviews relevant literatures; Section 3 briefly describes themodel; Section 4 describes how to select web search terms; Section 5 reports and discusses data analysis, data processing andexperiment results; Section 6 presents conclusions.22.1LITERATURE REVIEWResearch on web search dataWith the popularity of the Internet and mobile phones, Internettechnology is widely used in transportation industry. Currently,as an important platform for information dissemination, theInternet has been applied by almost all of the transportationcompanies, which publish product information and providetickets and accommodation booking. The majority of travellersalso get relevant information through the Internet. Beforegetting information, travellers would use search engines, suchas Google and Baidu, to find them. The records of search termswere conserved by search engine companies. The study of websearch data began in Ginsberg [11] by studying the main publichealth problem of seasonal influenza and the method of tracking the disease by analysing a large number of Google searchqueries was obtained. Besides, search engine data had beenwidely used in ranking universities [12], gathering public opinions [13, 14], predicting stock market volumes [15], predictingflu trend [5, 16], predicting academic fame [12], and forecastingtourist volume [2, 17], tourist arrivals [5, 18, 19] and publictransportation arrivals [6]. Search engine data were also usedto forecast general economic indicators such as unemploymentrates [20, 21], and tourist consumptions [22]. Furthermore,search engine data has also been applied to other specific consumption categories, like box-office revenue [23] and organic

FENG ET AL.foods market [24]. These studies indicate that the web searchterms have relationships with social behaviour because it canreflect public demand, including railway passenger demand.Although web search term data has been used in the predictionof many industries, there is still research gap in the field of railway passenger demand prediction. The previous studies couldbe divided into two directions, in which one direction is the analysis of the time series of railway passenger demand [25–27] andthe other direction is to forecast the railway passenger demandusing the economic or industrial indexes [28], such as thenumber of people, the number of tourists and railway operatingmileage. The disadvantage of the first direction is that it wouldbe no longer applicable when the actual situation changes. Thedisadvantage of the second direction is that its informationcollection is cumbersome and not timely. Thus it is not timelyto predict railway passenger demand using traditional data.2.2Research on the traffic demand forecastmethodsRailway passenger demand is affected by various randomfactors and it shows obvious non-linearity, randomness, andvolatility, which increase predictive difficulty to a certain extent.There are many methods used to forecast traffic demand, suchas auto-regressive integrated moving average (ARIMA) [29–31],neural network [1, 32–34], non-parametric regression [35, 36],Kalman filtering model [37], Gaussian maximum likelihood[38], gray model [39], and trend analysis method [40]. ARIMAis suitable for fitting and forecasting periodic time series. Therailway passenger demand is affected by many factors, and it isdifficult to predict passenger demand based on historical data.The parameters of this method are subject to subjective factors,which would affect the predictive results. Non-parametricregression performs well only on the condition of large numbers of samples and it shows poor performance when a smallnumber of samples is used. Kalman filtering model is adaptedfor linear fitting and the relationship between search terms andthe railway passenger demand is complicated, so the Kalmanfiltering model is unsuitable for predicting railway passengerdemand. Similarly, Gaussian maximum likelihood is also unsuitable for predicting railway passenger demand, which does notfit Gaussian distribution. Also these models are not suitablefor fitting the complex non-linear relationship between searchterms and the railway passenger demand. The gray systemtheory is simple and has fast operating speed, which couldgive a good performance for forecasting, but it is not idealfor dynamic systems, such as railway passenger demand. Thetrend analysis method is applied in relatively stable systems.The railway passenger demand may be affected by accidentalfactors, such as important festivals or holidays. Once thesehappen, the trend analysis method performs poorly. Others likechaotic time series and gray predictions require stable data.Among these techniques, neural networks have been frequently adopted as the modelling approach because of its adaptability, non-linearity and arbitrary function mapping capability[41].However, neural networks also have many disadvantages,435such as easy to fall into local minimum and poor generalization ability. To this end, many methods have been proposed toimprove this situation. In recent years, the usage of deep learning models [42–44], deep belief network [45, 46] and deep autoencoder [47–49] have changed these weaknesses to a certainextent. These models were mainly used for image recognition,classification task and so on. When a large number of trainingsamples are used, they can effectively find the distributed characteristics of samples, but when the number of samples is less,their generalization ability cannot be shown well.2.3Research on generative adversarial netsWith the development of neural networks, generative adversarial nets (GAN) [8] were proposed, which was used to generate image originally. The main purpose of this model is to trainthe generator which could produce realistic-looking images. Inthe adversarial process, images that the generator generated andreal images are both discriminated by discriminator, which contributes a better discriminative ability. Recently, GAN is usedfor traffic flow prediction [50] and traffic flow imputation [2].Conditional GAN (CGAN) [51], a derivative form of GAN,can perform the conditioning by feeding extra informationinto both discriminator and generator as an additional inputlayer. But its performance is not necessarily better than nonconditional GAN [51]. CGAN solves some special problems,which must consider extra condition information such as taxipassenger demand prediction [52]. GAN has many drawbacks,which include liberal parameters, training, lack of diversity ofthe samples generated, difficulty in indicating the training process when the value function of generator and discriminatorused and so on. Given these disadvantages, new methods areput forward to solve these problems. Deep convolution GAN(DCGAN) [53] uses convolutional layers instead of full-link layers, which greatly improves the stability of GAN training andthe quality of generated results. But it does not solve the problem fundamentally, which still needs to carefully balance thetraining process of the generator and the discriminator. Wasserstein GAN (WGAN) [9] efficiently addresses these problems.WGAN solves training instability completely and the traininglevel of the generator and the discriminator is no longer neededto balance carefully. It addresses collapse mode and ensures thediversity of the generated samples. Last but not least, WGANdoes not need to design the net architectures elaborately and thesimplest multiple layers network could obtain well performance.However, later studies found that the weights of WGAN’s discriminator focus on two extreme values, which would affect thegradient of the generator, and the weight clipping strategy ledto gradient disappear or explosion. WGAN-GP [10, 54, 55] isproposed to solve these two problems and it obtains a more stable training process and better performance for generating pictures. Most successful examples of WGAN-GP focus on generating data especially in images and texts, but it has not beenused for demand forecasting. Better generated data would trainbetter discriminator, and stronger discrimination ability meantbetter predictive ability.

FENG ET AL.436In summary, the web search terms have relationships withsocial behaviour because it can reflect public demand, includingrailway passenger demand. Using web search terms data for railway passenger transport demand forecasting could overcomedifficulties in delayed data fluctuation and high data collectioncosts. But there is a complex non-linear relationship betweenthose data and railway passenger demand, which requires amodel with preferable fitting ability. Neural network modelshave powerful fitting capabilities, but when the number of samples is less, their generalization ability cannot be shown well.WGAN-GP helps solve the problem. It can generate virtual datato expand the real data set and use parallel data to predict railwaypassenger transport demand. This paper attempts to use websearch terms and improve WGAN-GP to predict monthly railway passenger demand. Considering that Baidu search enginehas a market share of nearly 90% in China, this paper uses theBaidu index to find the data of search terms related to the railway passenger demand for predicting purposes and attempts touse the advantages of WGAN-GP for predicting railway passenger demand in Beijing. However, GAN is proposed as unsupervised learning [8], which means that it cannot be used to predict.To forecast the railway passenger demand, we formulate specialloss function and adversarial training under supervised learning.In this study, the generator and the discriminator use multilayerneural networks, and the hidden layers of the generator usethe rectified linear unit (ReLU) activation function. To makediscriminator own better discriminative and prediction ability,all activation functions of discriminator use sigmoid function.3METHODOLOGYThe training procedure of typical GAN corresponds to a minimax two-player game. GAN is a framework to improve the generative model through an adversarial process, in which two models are trained simultaneously: a generator G that captures thedata distribution, and a discriminator D estimates that a sample comes from the training data rather than a generator. Thetraining procedure for G maximizes the probability of making amistake of D. In the case where G and D are defined by multilayer perceptrons, GAN is proposed for generating and discriminating data, not for prediction. And it is unsupervised learning,which means that it cannot be used to predict. We improvedGAN by adding a predictor P and supervised learning adversarial training to predict the railway passenger demand.Note that the two neural networks, i.e. generator G and discriminator D, could be formulated in any type, in this paper,G and D use six layers neural networks. The purpose of G isto output data that have the same distribution with real websearch terms data x. An input noise variable z is defined, thena mapping to the generated web search terms data space asx ′ G𝜃 (z ) is represented, where G𝜃 is a differentiable functionrepresented by G with generator parameters 𝜃. At present, thedata distribution of x ′ is far away from that of x. Different fromtypical GAN, the purpose of D is not only to discriminate thatthe input data come from the real data or the generated data,but also to predict the railway passenger demand. The first fiveFIGURE 2Layout for the improved WGAN-GPlayers of D are used to discriminate the generated data or realdata, and the last layer is added to predict the demand of railwaypassengers. Therefore, the improved WGAN-GP consists of G,D, and predictor P. The layout for the improved WGAN-GP isshown in Figure 2. y is the label of x. y represents the real railway passenger demand. Then Dw (x) and Dw (x ′ ) are defined,where Dw represents D with critic parameters w. Dw (x) represents the fitted value of railway passenger demand based onreal web search terms data. Dw (x ′ ) represents the fitted value ofrailway passenger demand based on generated web search termsdata.The hidden layers of G use the ReLU activation function,ReLU as the following Equation (1).ReLU (hi ) max(0, wi j hi b j )(1)where hi represents the input of the ith layer, wi j is the weightfrom the ith layer to j th layer, b j is the bias of the j th layer.The output of G uses the linear function Equation (2).Linear(hi ) wi j hi b j(2)To make D get better discriminative ability, all activationfunctions of D use sigmoid function Equation (3). The output

FENG ET AL.437ALGORITHM 1 Improved WGAN-GP. We use values of 𝜆 0.1,𝜇 100, nD 5, nG 15000, 𝛼D 0.00002, 𝛼G 0.000001, 𝛽1 0.5,𝛽2 0.9of D uses the linear function Equation (2).Sigmoid(hi ) 11 e (wi j hi b j )(3)The numbers of neurons in the hidden layer of D and G areobtained by the following Equation (4):nodehidden 2 nodeinput 1(4)Require: Initial critic parameters w0 , initial generator parameters 𝜃0 .While 𝜃0 has not converged or n nG dofor t 1, , nD dowhere nodeinput represents the number of neurons in the inputlayer of D.The discriminator D and the predictor P were pre-trainedusing train sets with an un-trained generator G for certain iterations. To update the weights and bias of G and D, we needto define the loss functions of them. According to WGAN-GP[10], the loss function of D is as the following Equations (5) and(6).2̂ ‖LD E[Dw (x)] E[Dw (x ′ )] 𝜆E[‖‖ x̂ Dw (x)‖ p 1] (5)x̂ 𝜀x (1 𝜀)x ′Require: The gradient penalty coefficient 𝜆, the penalty coefficient ofsupervised learning 𝜇, the number of critic iterations per generatoriteration nD , the number of generator iteration nG , the batch size m,Adam hyperparameters 𝛼D , 𝛼G , 𝛽1 , 𝛽2 , the labels of real data y.for i 1, , m doSamplex from the real data, variable z from noise data, a randomnumber 𝜀 U [0, 1].x ′ G𝜃 (z )x̂ 𝜀x (1 𝜀)x ′i′LD Dw (x ′ ) Dw (x) 𝜆(‖ x̂ Dw (x)‖̂ 2 1)2 𝜇(y Dw (x))2end forw Adam( w1m mi 1L′iD , w, 𝛼D , 𝛽1 , 𝛽2 )end forSample a batch of variables z from noise data.1 m𝜃 Adam( 𝜃i 1 Dw (G𝜃 (z )), 𝜃, 𝛼, 𝛽1 , 𝛽2 )(6)mend whilewhere E[ ] represents expectation value, 𝜆 represents the gradient penalty coefficient, x̂ is a probability distribution Px̂ , whichis sampled uniformly along straight lines between pairs of pointssampled from the distribution Px and Px ′ , and it can be got bŷ p 1]2Equation (2). 𝜀 is a random value of 0 to 1. [‖ x̂ Dw (x)‖represents that the gradient of the D is limited to about 1, whichprevents the gradient of D from disappearing or exploding.The closer the data distribution of x ′ to that of x is, thesmaller the value of LD is. The more different the data distribution of x ′ from that of x is, the larger the value of LD is. It isimportant to note that the distribution of noise data z conformsto the standard normal distribution.G makes every effort to bring LD down. Considering thatE[Dw (x)] in Equation (5) is not related to G , the loss functionof G is as the following Equation (7).LG E[Dw (x ′ )](7)Since it is unsupervised learning, which means that it cannot be used to predict. To make D gain the predictive ability, supervised learning needs to be added in Equation (5).This paper proposes to change the loss function of D toEquation (8).LD ′ LD 𝜇 E[y Dw (x)]2(8)where 𝜇 represents the penalty coefficient of supervised learning, y is the label of x. Since this article uses improved WGANGP for prediction, the purpose of 𝜇 here is to make it focuson supervised learning when calculating the gradient of D. Itshould be noted that when the input data of D is x, the targetFIGURE 3Training and predicting procedure flow chartoutput is y, and when the input data of D is x ′ , the target outputis Dw (x ′ ) .The research uses the Adam optimization algorithm toupdate the weights and biases of improved WGAN-GP toimprove the training performance. The detailed steps of thetraining-improved WGAN-GP are as follows:The training and predicting procedure for the conditionalgenerator G, the conditional discriminator D, and the predictorP are illustrated in Figure 3.

FENG ET AL.438Step 4. To predict future railway passenger demand, analyse the leading and lagging relationship between websearch term data and railway passenger demand.FIGURE 4The selection steps of the web search termsIn this model, the predictor P is trained not only withreal data, but also with generated data x’. After the improvedWGAN-GP was well trained, we put the test sets i

According to different time periods, the demand forecasting can be divided into long-term and short-term. Long-term demand forecasting is helpful for large passenger transporta-tion project planning, infrastructure construction planning, and long-term investment decisions. Adjustments to train operation plans, passenger flow organization, and .