Customs Revenue Prediction Using Ensemble Methods (statistical Modeling .

Transcription

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesCustoms revenue prediction using ensemblemethods (statistical modeling vs machinelearning)Jordan Simonov and Zoran GligorovPICARD 202023-26 November 2020Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferences1 Introduction2 Statistical modelling3 Machine learning4 Ensamble learning5 Conclusion6 ReferencesJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesIntroductionJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesForecasting problemFinance ministries usually project the annual and monthlyrevenue collection targets that are expected to be met bycustoms administrationsModern cash management requires accurate short-termforecasts not only on a monthly basis, but also on a weekly ordaily basisForecasting daily collection of customs revenuesPossible approaches?Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesTable: Statistical modelling vs Machine learningStatistical modellingFormalisation of relationships between variablesin the form of mathematical equations.Required to assume shape of the model curveprior to performing model fitting to the data (e.g.linear, polynomial).Predicts the output with 85% accuracy at a 90%confidence level.Various diagnostics of parameters are performed,such as p-value.Data will be split into 70%/30% to create trainingand testing data. Model developed on trainingdata and tested on testing data.Models can be developed on a single dataset(training data), as diagnostics are performed atboth overall accuracy and individual variable level.Mostly used for research purposes.From the school of statistics and mathematics.Machine learningAlgorithm that can learn from the data without rulebased programming.Does not need to assume underlying shape, asmachine learning algorithms can learn complexpatterns automatically, based on the provided data.Predicts the output with 85% accuracy.Does not perform statistical diagnostic significancetests.Data will be split into 50%, 25%/25% to create training,validation, and testing data. Models developed ontraining and hyperparameters are tuned on validationdata and are evaluated against test data.Need to be trained on two datasets (training andvalidation data), to ensure two-point validation.Apt for implementation in a production environment.From the school of computer science.Source: Adopted according to Pratap (2017, p. 43)#Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesDaily customs duties collection in period 2017-2020, in MKD denarsMillions of denars60402002017201820192020YearCustoms duties daily collection in period 2017–2020, in MKD denarsMillion of aturdaySundayDays of week#Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingExponential smoothing (ETS)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingExponential smoothing (ETS)Auto-regressive integrated moving average (ARIMA)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingExponential smoothing (ETS)Auto-regressive integrated moving average (ARIMA)Forecasting with decomposition (STL)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingExponential smoothing (ETS)Auto-regressive integrated moving average (ARIMA)Forecasting with decomposition (STL)Trigonometric Exponential smoothing state space modelwith Box-Cox transformation, ARMA errors (TBATS)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesStatistical modellingExponential smoothing (ETS)Auto-regressive integrated moving average (ARIMA)Forecasting with decomposition (STL)Trigonometric Exponential smoothing state space modelwith Box-Cox transformation, ARMA errors (TBATS)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningForecasts from ETS(A,N,N)ConclusionReferencesForecast from ARIMA(3,1,0)(2,1,0) [7]60Millions of denarsMillions of 020YearsForecasts from STL ETS(A,N,N)Forecasts from TBATS(1, {3,2}, -, { 7,3 , 365.25,7 })6060Millions of denarsMillions of 020Years#Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesTable: Evaluating forecast accuracyTraining setTest 0Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesMachine learningJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesMachine learningLinear regression (LM)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesMachine learningLinear regression (LM)Classification and regression trees (CART)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesMachine learningLinear regression (LM)Classification and regression trees (CART)Conditional inference tree (CTREE)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesMachine learningLinear regression (LM)Classification and regression trees (CART)Conditional inference tree (CTREE)eXtreme Gradient Boosting (XGBoost)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesMachine learningLinear regression (LM)Classification and regression trees (CART)Conditional inference tree (CTREE)eXtreme Gradient Boosting (XGBoost)Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningForecast from LM60Millions of denarsMillions of denarsReferencesForecast from 02020192020YearsForecast from CTREEForecast from GBM60Millions of denars60Millions of 72018Years#Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesTable: Evaluating forecast accuracyTraining setTest ordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesEnsamble learningJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesEvaluating forecast accuracyMachine learningStatistical modelling1239MAE261300CARTXGBoostLMCTREEEnsamble MLETSARIMAMachine learningSTLTBATSEnsamble SMTBATSEnsamble SMStatistical modelling5410RMSE325100LMCARTCTREEXGBoostEnsamble MLETSARIMASTL##Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesBy using statistical modelling and machine learning, wetested 10 different models in the R ecosystemJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesBy using statistical modelling and machine learning, wetested 10 different models in the R ecosystemUsing statistical modelling, the ensemble techniquereduced the RMSE error from 12.99 to 4.92, while whenusing machine learning, the error of 4.86 was reduced to3.53Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesConclusionJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesConclusionAs for both approaches, the ensemble technique has shownthat it can improve prediction accuracy compared to theindividual models. They can reduce the forecasting error, soto that end, we can conclude that the ensemble technique iscertainly a game changer and must be an important additionto every forecaster’s toolbox. For this reason, we recommendusing this technique for forecasting purposes with statisticalmodelling or machine learning. The choice is yours!Jordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesReferencesJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

IntroductionStatistical modellingMachine learningEnsamble learningConclusionReferencesReferencesPratap, D. (2017). Statistics for machine learning. PacktPublishing Ltd.Nielsen, A. (2019). Practical time series analysis: Predictionwith statistics and machine learning. O’Reilly Media, USAHyndman. R. J., & Athanasopoulos, G. (2016). Forecasting:Principles and practice. Monash University, AustraliaLewis,N.D (2017). Neural Networks for Time-SeriesForecasting with RJordan Simonov and Zoran GligorovCustoms revenue prediction using ensemble methods (statistical modeling vs machine learning)PICARD 2020

to every forecaster's toolbox. For this reason, we recommend using this technique for forecasting purposes with statistical modelling or machine learning. . (2017). Statistics for machine learning. Packt Publishing Ltd. Nielsen, A. (2019). Practical time series analysis: Prediction with statistics and machine learning. O'Reilly Media, USA .