Applied Econometrics Using MATLAB

Transcription

Applied Econometrics using MATLABJames P. LeSageDepartment of EconomicsUniversity of ToledoCIRCULATED FOR REVIEWOctober, 1998

2

PrefaceThis text describes a set of MATLAB functions that implement a host ofeconometric estimation methods. Toolboxes are the name given by theMathWorks to related sets of MATLAB functions aimed at solving a particular class of problems. Toolboxes of functions useful in signal processing,optimization, statistics, nance and a host of other areas are available fromthe MathWorks as add-ons to the standard MATLAB software distribution.I use the term Econometrics Toolbox to refer to the collection of functionlibraries described in this book.The intended audience is faculty and students using statistical methods,whether they are engaged in econometric analysis or more general regressionmodeling. The MATLAB functions described in this book have been usedin my own research as well as teaching both undergraduate and graduateeconometrics courses. Researchers currently using Gauss, RATS, TSP, orSAS/IML for econometric programming might nd switching to MATLABadvantageous. MATLAB software has always had excellent numerical algorithms, and has recently been extended to include: sparse matrix algorithms,very good graphical capabilities, and a complete set of object oriented andgraphical user-interface programming tools. MATLAB software is availableon a wide variety of computing platforms including mainframe, Intel, Apple,and Linux or Unix workstations.When contemplating a change in software, there is always the initialinvestment in developing a set of basic routines and functions to supporteconometric analysis. It is my hope that the routines in the EconometricsToolbox provide a relatively complete set of basic econometric analysis tools.The toolbox also includes a number of functions to mimic those availablein Gauss, which should make converting existing Gauss functions and applications easier. For those involved in vector autoregressive modeling, acomplete set of estimation and forecasting routines is available that implement a wider variety of these estimation methods than RATS software. Forexample, Bayesian Markov Chain Monte Carlo (MCMC) estimation of VARi

iimodels that robustify against outliers and accommodate heteroscedastic disturbances have been implemented. In addition, the estimation functions forerror correction models (ECM) carry out Johansen’s tests to determine thenumber of cointegrating relations, which are automatically incorporated inthe model. In the area of vector autoregressive forecasting, routines areavailable for VAR and ECM methods that automatically handle data transformations (e.g. di erencing, seasonal di erences, growth rates). This allowsusers to work with variables in raw levels form. The forecasting functionscarry out needed transformations for estimation and return forecasted valuesin level form. Comparison of forecast accuracy from a wide variety of vectorautoregressive, error correction and other methods is quite simple. Userscan avoid the di cult task of unraveling transformed forecasted values fromalternative estimation methods and proceed directly to forecast accuracycomparisons.The collection of around 300 functions and demonstration programs areorganized into libraries that are described in each chapter of the book. Manyfaculty use MATLAB or Gauss software for research in econometric analysis,but the functions written to support research are often suitable for only asingle problem. This is because time and energy (both of which are in shortsupply) are involved in writing more generally applicable functions. Thefunctions described in this book are intended to be re-usable in any numberof applications. Some of the functions implement relatively new MarkovChain Monte Carlo (MCMC) estimation methods, making these accessibleto undergraduate and graduate students with absolutely no programminginvolved on the students part. Many of the automated features available inthe vector autoregressive, error correction, and forecasting functions arosefrom my own experience in dealing with students using these functions. Itseemed a shame to waste valuable class time on implementation details whenthese can be handled by well-written functions that take care of the details.A consistent design was implemented that provides documentation, example programs, and functions to produce printed as well as graphical presentation of estimation results for all of the econometric functions. Thiswas accomplished using the “structure variables” introduced in MATLABVersion 5. Information from econometric estimation is encapsulated into asingle variable that contains “ elds” for individual parameters and statisticsrelated to the econometric results. A thoughtful design by the MathWorksallows these structure variables to contain scalar, vector, matrix, string,and even multi-dimensional matrices as elds. This allows the econometricfunctions to return a single structure that contains all estimation results.These structures can be passed to other functions that can intelligently de-

iiicipher the information and provide a printed or graphical presentation ofthe results.The Econometrics Toolbox should allow faculty to use MATLAB in undergraduate and graduate level econometrics courses with absolutely no programming on the part of students or faculty. An added bene t to usingMATLAB and the Econometrics Toolbox is that faculty have the option ofimplementing methods that best re ect the material in their courses as wellas their own research interests. It should be easy to implement a host of ideasand methods by: drawing on existing functions in the toolbox, extendingthese functions, or operating on the results from these functions. As there isan expectation that users are likely to extend the toolbox, examples of howto accomplish this are provided at the outset in the rst chapter. Anotherway to extend the toolbox is to download MATLAB functions that are available on Internet sites. (In fact, some of the routines in the toolbox originallycame from the Internet.) I would urge you to re-write the documentationfor these functions in a format consistent with the other functions in thetoolbox and return the results from the function in a “structure variable”.A detailed example of how to do this is provided in the rst chapter.In addition to providing a set of econometric estimation routines and documentation, the book has another goal. Programming approaches as well asdesign decisions are discussed in the book. This discussion should make iteasier to use the toolbox functions intelligently, and facilitate creating newfunctions that t into the overall design, and work well with existing toolboxroutines. This text can be read as a manual for simply using the existingfunctions in the toolbox, which is how students tend to approach the book.It can also be seen as providing programming and design approaches thatwill help implement extensions for research and teaching of econometrics.This is how I would think faculty would approach the text. Some facultyin Ph.D. programs expect their graduate students to engage in econometricproblem solving that requires programming, and certainly this text wouldeliminate the burden of spending valuable course time on computer programming and implementation details. Students in Ph.D. programs receivethe added bene t that functions implemented for dissertation work can beeasily transported to another institution, since MATLAB is available foralmost any conceivable hardware/operating system environment.Finally, there are obviously omissions, bugs and perhaps programmingerrors in the Econometrics Toolbox. This would likely be the case with anysuch endeavor. I would be grateful if users would notify me when they encounter problems. It would also be helpful if users who produce generallyuseful functions that extend the toolbox would submit them for inclusion.

ivMuch of the econometric code I encounter on the internet is simply toospeci c to a single research problem to be generally useful in other applications. If econometric researchers are serious about their newly proposedestimation methods, they should take the time to craft a generally usefulMATLAB function that others could use in applied research. Inclusion inthe Econometrics Toolbox would also have the bene t of introducing themethod to faculty teaching econometrics and their students.The latest version of the Econometrics Toolbox functions can be found onthe Internet at: http://www.econ.utoledo.edu under the MATLAB galleryicon. Instructions for installing these functions are in an Appendix to thistext along with a listing of the functions in the library and a brief descriptionof each.

Contents1 Introduction12 Regression using MATLAB2.1 Design of the regression library . . . . . . .2.2 The ols function . . . . . . . . . . . . . . .2.3 Selecting a least-squares algorithm . . . . .2.4 Using the results structure . . . . . . . . . .2.5 Performance pro ling the regression toolbox2.6 Using the regression library . . . . . . . . .2.6.1 A Monte Carlo experiment . . . . .2.6.2 Dealing with serial correlation . . .2.6.3 Implementing statistical tests . . . .2.7 Chapter summary . . . . . . . . . . . . . .5681217283031323841Chapter 2 Appendix433 Utility Functions3.1 Calendar function utilities . . .3.2 Printing and plotting matrices3.3 Data transformation utilities .3.4 Gauss functions . . . . . . . . .3.5 Wrapper functions . . . . . . .3.6 Chapter summary . . . . . . .47475167717576.Chapter 3 Appendix794 Regression Diagnostics4.1 Collinearity diagnostics and procedures . . . . . . . . . . . .4.2 Outlier diagnostics and procedures . . . . . . . . . . . . . . .4.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . .v838397103

viCONTENTSChapter 4 Appendix5 VAR and Error Correction Models5.1 VAR models . . . . . . . . . . . . .5.2 Error correction models . . . . . .5.3 Bayesian variants . . . . . . . . . .5.3.1 Theil-Goldberger estimation5.4 Forecasting the models . . . . . . .5.5 Chapter summary . . . . . . . . .105. . . . . . . . . . . . .of these. . . . . . . . . . . . . . . . . . . . .models. . . . . . . . .Chapter 5 Appendix1516 Markov Chain Monte Carlo Models6.1 The Bayesian Regression Model . . . . . . . .6.2 The Gibbs Sampler . . . . . . . . . . . . . . .6.2.1 Monitoring convergence of the sampler6.2.2 Autocorrelation estimates . . . . . . .6.2.3 Raftery-Lewis diagnostics . . . . . . .6.2.4 Geweke diagnostics . . . . . . . . . . .6.3 A heteroscedastic linear model . . . . . . . .6.4 Gibbs sampling functions . . . . . . . . . . .6.5 Metropolis sampling . . . . . . . . . . . . . .6.6 Functions in the Gibbs sampling library . . .6.7 Chapter summary . . . . . . . . . . . . . . .Chapter 6 Appendix7 Limited Dependent Variable Models7.1 Logit and probit regressions . . . . .7.2 Gibbs sampling logit/probit models .7.2.1 The probit g function . . . .7.3 Tobit models . . . . . . . . . . . . .7.4 Gibbs sampling Tobit models . . . .7.5 Chapter summary . . . . . . . . . .Chapter 7 Appendix107. 107. 116. 128. 141. 142. 148155. 158. 160. 163. 166. 167. 169. 173. 179. 188. 193. 201203.207. 209. 213. 221. 223. 226. 2292318 Simultaneous Equation Models2338.1 Two-stage least-squares models . . . . . . . . . . . . . . . . . 2338.2 Three-stage least-squares models . . . . . . . . . . . . . . . . 2388.3 Seemingly unrelated regression models . . . . . . . . . . . . . 243

viiCONTENTS8.4Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . 247Chapter 8 Appendix2499 Distribution functions library2519.1 The pdf, cdf, inv and rnd functions . . . . . . . . . . . . . . . 2529.2 The specialized functions . . . . . . . . . . . . . . . . . . . . 2539.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . 260Chapter 9 Appendix10 Optimization functions library10.1 Simplex optimization . . . . . . . . . . . .10.1.1 Univariate simplex optimization .10.1.2 Multivariate simplex optimization10.2 EM algorithms for optimization . . . . . .10.3 Multivariate gradient optimization . . . .10.4 Chapter summary . . . . . . . . . . . . .261.Chapter 10 Appendix11 Handling sparse matrices11.1 Computational savings with sparse matrices11.2 Estimation using sparse matrix algorithms .11.3 Gibbs sampling and sparse matrices . . . .11.4 Chapter summary . . . . . . . . . . . . . .265266266273274283291293.295. 295. 298. 304. 309Chapter 11 Appendix311References313Appendix321

viiiCONTENTS

List of Examples2.12.22.32.42.52.62.72.82.9Demonstrate regression using the ols() functionLeast-squares timing information . . . . . . . .Pro ling the ols() function . . . . . . . . . . . .Using the ols() function for Monte Carlo . . . .Generate a model with serial correlation . . . .Cochrane-Orcutt iteration . . . . . . . . . . . .Maximum likelihood estimation . . . . . . . . .Wald’s F-test . . . . . . . . . . . . . . . . . . .LM speci cation test . . . . . . . . . . . . . . .113.123.133.143.15Using the cal() function . . . . . . . . . . . . . . . .Using the tsdate() function . . . . . . . . . . . . . .Using cal() and tsdates() functions . . . . . . . . . .Reading and printing time-series . . . . . . . . . . .Using the lprint() function . . . . . . . . . . . . . . .Using the tsprint() function . . . . . . . . . . . . . .Various tsprint() formats . . . . . . . . . . . . . . . .Truncating time-series and the cal() function . . . .Common errors using the cal() function . . . . . . .Using the tsplot() function . . . . . . . . . . . . . .Finding time-series turning points with fturns() . . .Seasonal di erencing with sdi () function . . . . . .Annual growth rates using growthr() function . . .Seasonal dummy variables using sdummy() functionLeast-absolute deviations using lad() function . . arity experiment . .Using the bkw() function .Using the ridge() function .Using the rtrace() function.85899193.ix.

xLIST OF hetheil() function .dfbeta() functionrobust() functionpairs() function . 95. 97. 100. g the var() function . . . . . . .Using the pgranger() function . . . .VAR with deterministic variables . .Using the lrratio() function . . . . .Using the adf() and cadf() functionsUsing the johansen() function . . . .Estimating error correction models .Estimating BVAR models . . . . . .Using bvar() with general weights . .Estimating RECM models . . . . . .Forecasting VAR models . . . . . . .Forecasting multiple related modelsA forecast accuracy experiment . . .46.56.66.76.86.9A simple Gibbs sampler . . . . . . . . . . . . . . . .Using the coda() function . . . . . . . . . . . . . . .Using the raftery() function . . . . . . . . . . . . . .Geweke’s convergence diagnostics . . . . . . . . . . .Using the momentg() function . . . . . . . . . . . . .Heteroscedastic Gibbs sampler . . . . . . . . . . . .Using the ar g() function . . . . . . . . . . . . . . . .Metropolis within Gibbs sampling . . . . . . . . . .Bayesian model averaging with the bma g() 57.67.7Logit and probit regression functions . .Demonstrate Albert-Chib latent variableGibbs vs. maximum likelihood logit . .Gibbs vs. maximum likelihood probit . .Heteroscedastic probit model . . . . . .Tobit regression function . . . . . . . . .Gibbs sampling tobit estimation . . . .2092142172182202242278.18.28.38.4Two-stage least-squares .Monte Carlo study of ols()Three-stage least-squaresUsing the sur() function .234237240245. .vs. . . . . .tsls(). . . . . . .

LIST OF EXAMPLES9.19.29.39.410.110.210.310.4Beta distribution function example . . . . . . .Random Wishart draws . . . . . . . . . . . . .Left- and right-truncated normal draws . . . .Rejection sampling of truncated normal draws .xi.252256257258Simplex maximum likelihood for Box-Cox model . .EM estimation of switching regime model . . . . . .Maximum likelihood estimation of the Tobit modelUsing the solvopt() function . . . . . . . . . . . . .27028028529011.1 Using sparse matrix functions . . . . . . . . . . . . . . . . . 29711.2 Solving for rho using the far() function . . . . . . . . . . . . 30311.3 Gibbs sampling with sparse matrices . . . . . . . . . . . . . 308

xiiLIST OF EXAMPLES

List of Figures2.1Histogram of ˆ outcomes3.13.2Output from tsplot() function . . . . . . . . . . . . . . . . . . 63Graph of turning point events . . . . . . . . . . . . . . . . . . 684.14.24.34.4Ridge trace plot . . . .Dfbeta plots . . . . . .D ts plots . . . . . .Pairwise scatter plots .5.15.2Prior means and precision for important variables . . . . . . . 138Prior means and precision for unimportant variables . . . . . 1396.16.26.36.4Prior Vi distributions for various values of rMean of Vi draws . . . . . . . . . . . . . . .Distribution of 1 2 . . . . . . . . . . . . In W as a function of . . . . . . . . .7.17.27.3Cumulative distribution functions compared . . . . . . . . . . 209Actual y vs. mean of latent y -draws . . . . . . . . . . . . . . 216Posterior mean of vi draws with outliers . . . . . . . . . . . . 2229.19.29.3Beta distribution demonstration program plots . . . . . . . . 254Histograms of truncated normal distributions . . . . . . . . . 257Contaminated normal versus a standard normal distribution . 259. . . . . . . . . . . . . . . . . . . . 33.94989910217518018819110.1 Plots from switching regime regression . . . . . . . . . . . . . 28211.1 Sparsity structure of W from Pace and Berry . . . . . . . . . 297xiii

xivLIST OF FIGURES

List of Tables2.12.2Timing (in seconds) for Cholesky and QR Least-squares . . . 13Digits of accuracy for Cholesky vs. QR decomposition . . . . 164.14.24.3Variance-decomposition proportions table . . . . . . . . . . . 87BKW collinearity diagnostics example . . . . . . . . . . . . . 88Ridge Regression for our Monte Carlo example . . . . . . . . 916.1A Comparison of FAR Estimatorsxv. . . . . . . . . . . . . . . 194

xviLIST OF TABLES

Chapter 1IntroductionThe Econometrics Toolbox contains around 50 functions that implementeconometric estimation procedures, 20 functions to carry out diagnosticand statistical testing procedures, and 150 support and miscellaneous utility functions. In addition, there are around 100 demonstration functionscovering all of the econometric estimation methods as well as the diagnosticand testing procedures and many of the utility functions. Any attempt todescribe this amount of code must be organized.Chapter 2 describes the design philosophy and mechanics of implementation using least-squares regression. Because regression is widely-understoodby econometricians and others working with statistics, the reader should befree to concentrate on the ‘big picture’. Once you understand the way thatthe Econometric Toolbox functions encapsulate estimation and other resultsin the new MATLAB Version 5 ‘structure variables’, you’re on the way tosuccessfully using the toolbox.Despite the large (and always growing) number of functions in the toolbox, you may not nd exactly what you’re looking for. From the outset,examples are provided that illustrate how to incorporate your own functionsin the toolbox in a well-documented, consistent manner. Your functionsshould be capable of using the existing printing and graphing facilities toprovide printed and graphical display of your results.Chapters 3 through 10 focus more directly on description of functionsaccording to their econometric purpose. These chapters can be read asmerely a software documentation manual, and many beginning studentsapproach the text in this manner. Another approach to the text is forthose interested in MATLAB programming to accomplish research tasks.Chapters 3 through 10 describe various design challenges regarding the task1

2CHAPTER 1. INTRODUCTIONof passing information to functions and returning results. These are usedto illustrate alternative design and programming approaches. Additionally,some estimation procedures provide an opportunity to demonstrate codingthat solves problems likely to arise in other estimation tasks. Approachingthe text from this viewpoint, you should gain some familiarity with a hostof alternative coding tricks, and the text should serve as a reference to thefunctions that contain these code fragments. When you encounter a similarsituation, simply examine (or re-use) the code fragments modi ed to suityour particular problem.Chapter 3 presents a library of utility functions that are used by manyother functions in the Econometrics Toolbox. For example, all printing ofresults from econometric estimation and testing procedures is carried out bya single function that prints matrices in a speci ed decimal or integer formatwith optional column and row labels. Many of the examples throughout thetext also rely on this function named mprint.Regression diagnostic procedures are the topic of Chapter 4. Diagnosticsfor collinearity and in uential observations from texts like Belsley, Kuh andWelsch (1980) and Cook and Weisberg (1982) are discussed and illustrated.Chapter 5 turns attention to vector autoregressive and error correctionmodels, as well as forecasting. Because we can craft our own functions inMATLAB, we’re not restricted to a limited set of Bayesian priors as in thecase of RATS software. It is also possible to ease the tasks involved withcointegration testing for error correction models. These tests and model formation can be carried out in a single function with virtually no interventionon the part of users. A similar situation exists in the area of forecasting.Users can be spared the complications that arise from data transformationsprior to estimation that require reverse transformations to the forecastedvalues.A recent method that has received a great deal of attention in the statistics literature, Markov Chain Monte Carlo, or MCMC is covered in Chapter 6. Econometric estimation procedures are also beginning to draw on thisapproach, and functions are crafted to implement these methods. Additionalfunctions were devised to provide convergence diagnostics (that are an integral part of the method) as well as presentation of printed and graphicalresults. These functions communicate with each other via the MATLABstructure variables.Chapter 7 takes up logit, probit and tobit estimation from both a maximum likelihood as well as MCMC perspective. Recent MCMC approachesto limited dependent variable models hold promise for dealing with nonconstant variance and outliers, and these are demonstrated in this chapter.

3Simultaneous equation systems are the subject of Chapter 8, where weface the challenge of encapsulating input variables for a system of equationswhen calling our toolbox functions. Although MATLAB allows for variablesthat are global in scope, no such variables are used in any Econometric Toolbox functions. We solve the challenges using MATLAB structure variables.Chapter 9 describes a host of functions for calculating probability densities, cumulative densities, quantiles, and random deviates from twelve frequently used statistical distributions. Other more special purpose functionsdealing with statistical distributions are also described.The subject of optimization is taken up in Chapter 10 where we demonstrate maximum likelihood estimation. Alternative approaches and functions are described that provide a consistent interface for solving these typesof problems.The nal chapter discusses and illustrates the use of MATLAB sparsematrix functions. These are useful for solving problems involving large matrices that contain a large proportion of zeros. MATLAB has a host ofalgorithms that can be used to operate on this type of matrix in an intelligent way the conserves on both time and computer memory.Readers who have used RATS, TSP or SAS should feel very comfortableusing the Econometrics Toolbox functions and MATLAB. The procedurefor producing econometric estimates is very similar to these other softwareprograms. First, the data les are “loaded” and any needed data transformations to create variables for the model are implemented. Next, anestimation procedure is called to operate on the model variables and a command is issued to print or graph the results.The text assumes the reader is familiar with basic MATLAB commandsintroduced in the software manual, Using MATLAB Version 5 or The Student Edition of MATLAB, Version 5 User’s Guide by Hanselmann and Little eld (1997). Gauss users should have little trouble understanding thistext, perhaps without reading the MATLAB introductory manuals, as thesyntax of MATLAB and Gauss is very similar.All of the functions in the Econometrics Toolbox have been tested using MATLAB Version 5.2 on Apple, Intel/Windows and Sun Microsystemscomputing platforms. The functions also work with Versions 5.0 and 5.1,but some printed output that relies on a new string justi cation option inVersion 5.2 may not appear as nicely as it does in Version 5.2.The text contains 71 example programs that come with the EconometricToolbox les. Many of these examples generate random data samples, so theresults you see will not exactly match those presented in the text. In additionto the example les, there are demonstration les for all of the econometric

4CHAPTER 1. INTRODUCTIONestimation and testing functions and most of the utility functions.To conserve on space in the text, the printed output was often edited toeliminate ‘white space’ that appears in the MATLAB command window. Insome cases output that was too wide for the text pages was also altered toeliminate white space.As you study the code in the Econometric Toolbox functions, you will seea large amount of repeated code. There is a trade-o between writing functions to eliminate re-use of identical code fragments and clarity. Functionsafter all hide code, making it more di cult to understand the operationsthat are actually taking place without keeping a mental note of what thesub-functions are doing. A decision was made to simply repeat code fragments for clarity. This also has the virtue of making the functions moreself-contained since they do not rely on a host of small functions.Another issue is error checking inside the functions. An almost endlessamount of error checking code could be written to test for a host of possiblemistakes in the user input arguments to the functions. The EconometricToolbox takes a diligent approach to error checking for functions like leastsquares that are apt to be the rst used by students. All functions check forthe correct number of input arguments, and in most cases test for structurevariables where they are needed as inputs. My experience has been thatafter students master the basic usage format, there is less need for extensiveerror checking. Users can easily interpret error messages from the function tomean that input arguments are not correct. A check of the input argumentsin the function documentation against the user’s MATLAB command lewill usually su ce to correct the mistake and eliminate the error message. Ifyour students (or you) nd that certain errors are typical, it should be fairlystraightforward to add error checking code and a message that is speci cto this type of usage problem. Another point in this regard is that myexperience shows certain types of errors are unlikely. For example, it isseldom the case that users will enter matrix and vector arguments that havea di erent number of observations (or rows). This is partly due to the waythat MATLAB works. Given this, many of the functions do not check forthis type of error, despite the fact that we could add code to carry out thistype of check.

Chapter 2Regression using MATLABThis chapter describes the design and implementation of a regression function library. Toolboxes are the name given by the MathWorks to relatedsets of MATLAB functions aimed at solving a particular class of problems.Toolboxes of functions useful in signal processing, optimization, statistics, nance and a host of other areas are available from the MathWorks as add-onsto the standard MATLAB distribution. We will reserve the term Econometrics Toolbox to refer to the collection of function libraries discussed in eachchapter of the text. Many of the function libraries rely on a common utility function library and on other function libraries. Taken together, theseconstitute the Econometrics Toolbox described in this book.All econometric esti

Econometrics Toolbox Econometrics Toolbox Econometrics Toolbox iii cipher the information and provide a printed or graphical presentation of the results. The should allow faculty to use MATLAB in un-dergraduate and graduate level econometrics courses with absolutely no pro-gramming on