Performance Comparison Of Automated Valuation Models

Transcription

Performance Comparison of Automated Valuation ModelsBY J. WAYNE MOOREThis article is based on a presentation given September 21, 2005, at the IAAO 71st AnnualInternational Conference on Assessment Administration in Anchorage, Alaska.Author’s Note: The work reported in thispaper was completed for academic credit inpartial fulfillment of degree requirements ina doctoral program. It was done independently and personally, without sponsorshipby any organization, commercial or otherwise. It was undertaken for the sole purposeof contributing to the body of availableknowledge on CAMA techniques used byassessors throughout the world.Because it is necessary for assessorsto prepare value estimates for hugenumbers of properties, all by a specific point in time each year, a processcalled computer-assisted mass appraisal(CAMA), which uses automated valuation models (AVMs), has evolved duringthe past 35 years to handle the logisticchallenge presented by this task. SixCAMA methodologies presently existfor determining the assessed value ofresidential properties for local propertytaxation. The first method is the directsales comparison approach, which iswidely used by fee appraisers to producemortgage appraisals for home purchases.This method is employed less frequentlyby assessors for the mass appraisalprocess, but it is widely used to both challenge and defend individual propertyassessments. A second method, multipleregression analysis (MRA) using softwaresuch as SPSS, is a statistical extension ofdirect sales comparison. This methodhas emerged in the past 30 years as thepower of the computer has becomeavailable to assessors. The third methodis adaptive estimation procedure (AEP),also called feedback, which has its rootsin numerical analysis and has also beenavailable for about 30 years. The fourthand most commonly used method isthe cost approach that relies upon localmarket analysis to provide an estimate ofdepreciation from all causes. The fifthis a hybrid approach referred to in thisresearch as the transportable cost-specified market (TCM) approach. These fivemethods are used to varying degrees bylocal property assessors throughout theworld. A sixth method exists, based uponartificial neural networks, but it is notwidely used.J. Wayne Moore, in his 32-year career, has been involved directly or indirectly in implementingAVM-based CAMA systems in more than 300 assessing jurisdictions in North America. Heholds an undergraduate degree in Economics from the University of Delaware and a Masterof Science degree in Systems Engineering from Southern Methodist University. He is currentlyenrolled in a doctoral program at Northcentral University. Mr. Moore was the founder of ProValCorporation and serves as Application Architect at Manatron, Inc.Journal of Property Tax Assessment & Administration Volume 3, Issue 143

Assessing professionals have presentednumerous case study reports on AVMmethodology, but a research study andcontrolled experiment has not beenreported that statistically compares theresults of the main AVM methodologieswhen applied to the same jurisdiction. Acompetition among vendors to select acomputer-assisted mass appraisal systemwhich was conducted by the Board ofCounty Commissioners of the County ofAllegheny, Pennsylvania, in 1976–1977has been reported and described (Carbone, Ivory, and Longini 1980). In 1988,Richard Ward and Lorraine Steiner presented a paper describing a comparisonof feedback and nonlinear regression.At the time of their research, nonlinearregression was just beginning to appearand the stated purpose of the study was“to clarify for assessors some of the issuesraised by these techniques with the hopethat the comparison of these techniqueswill contribute to assessor education inthe CAMA area.” (Ward and Steiner1988, 43) Consistent with its educationalpurpose, the paper provided an overviewof software available at the time andsummarized statistical results from fourdifferent tests, but its main purpose wasdescription of new CAMA techniquesrather than performance comparison.Charles Calhoun discussed the lackof independent testing of AVMs in hisarticle in Housing Finance International inwhich he reported on property valuationmethods and data in the United States.While there is increasing competitionamong various commercial models,independent evaluations are practicallynonexistent given the proprietary natureof the data and models. Whether marketforces will ultimately identify the mostsuccessful methodologies depends in parton the ability of consumers of these modelsto undertake their own validations. (Calhoun 2001, 21, n. 38)The controlled experiment reportedhere fills the research gap that Calhoundescribes. The research used valid mar44ket sales transactions, including propertydescriptive data, for the five years from1999 through 2003 to estimate the 2004selling prices of an existing residentialproperty population in an actual jurisdiction using different CAMA methodsas treatments. The model specification,calibration, and value estimation workwas done blindly by nine independentCAMA practitioners without knowledgeof the source of the data, the actual 2004sale prices, or even the names of otherparticipants. Six of the participants useda generic AVM market model specification approach with the software toolsof their choice. The three other participants used a pre-specified transportableAVM. For comparison, valuations usingthe cost approach were prepared by theauthor. Standard IAAO statistical qualitymeasures (IAAO 1999, 41–44) were applied to actual 2004 sales and comparedto calculated values of properties fromthe same population for each participantto determine if differences of any statistical significance existed between themethods. Neither direct sales comparison nor artificial neural networks wereconsidered in this research.Literature ReviewThe first attempts at using multipleregression analysis (MRA) to estimateproperty market value occurred around1970 (Gloudemans and Miller 1976).Prior to that time, the most widely usedmethod was the traditional cost approach, done primarily by hand withminimal market analysis.The formal description of the adaptive estimation procedure (AEP), alsocalled feedback, first appeared in thelate 1970s (Carbone, Ivory, and Longini1980). Carbone’s PhD dissertation provided a rigorous academic definitionof the technique (Carbone 1976). Thisprocedure tests and systematically adjustsmodel coefficients, converging upon theset of coefficients that minimize an errorterm (IAAO 2003, 12). Schultz makes acase for the use of feedback in his 2001Journal of Property Tax Assessment & Administration Volume 3, Issue 1

Assessment Journal article.There have been numerous conference papers, case studies, and journalarticles on the application of both MRAand AEP in the past 20 years. Typical ofthese is a 1995 paper describing the Denver County, Colorado, revaluation usingmultiple regression analysis presentedby the jurisdiction’s chief appraiser, BenWhite (1995). White’s paper providesan informative discussion of the revaluation process used throughout NorthAmerica.The fundamentals of the cost approach have been well documented formore than 70 years in books such as TheValuation of Property (Bonbright 1937).The traditional cost approach is bydefinition not a market approach, eventhough in theory all three approachesto value (cost, market, and income)should yield similar final values. The costapproach, with locally developed depreciation schedules, or with depreciationindividually determined by appraisers, iswidely used by assessors.Cost theoretically sets an upper limiton market value (assuming reasonablesupply and time factors) and it is generally acknowledged that the main difficultyin using the cost approach is estimationof depreciation from all causes (physical, functional, and economic) and therapidly changing dynamics of the realestate market (Clapp 1977). Nevertheless, a number of states such as Alabama,Illinois, Indiana, Iowa, Nevada, andMichigan publish a state cost manualwith a depreciation schedule and requireor encourage its use by assessors in theirrespective states.Variations of the hybrid techniquereferred to in this research as the transportable cost-specified market (TCM)approach have been the subject of numerous papers by assessment professionals. Asearly as 1966, Franklin Graham, Assessorof the City of Wisconsin Dells, Wisconsin,published an article that proposed a newapproach, beginning his paper by stating,“This method is a combination of thecost approach and the market data approach.” (Graham 1966, 42) An article 14years later, after the introduction of MRAinto the assessment process, discussed asimplifying base home approach that washinted at in Graham’s article (Gloudemans 1981).In 1986, Eckert published a paper suggesting methods for calibrating the costmodel to market that provided insightinto the TCM approach. “Much of theprocess of determining depreciation andfine tuning for location factors in the costmodel can be done with the aid of linearand non-linear multiple regression, orfeedback.” (Eckert 1986, 14) In 1991,Ireland presented a paper on transportability of a market-calibrated cost modelbased upon the Illinois cost manual (Ireland and Adams 1991). Ward provided ademonstration on the use of feedback tocalibrate cost models at the 1993 IAAOAnnual Conference on Assessment Administration (Ward 1993). This authorpresented a paper at the 1995 IAAO annual conference on a market-correlatedstratified cost approach that defined ahybrid, engineered cost model incorporating market factors (Moore 1995). Thishybrid TCM model is now widely used.At the Integrating GIS and CAMA 2005Conference, Gloudemans and Nelsonpresented a paper describing “how theDistrict [of Columbia] used SPSS’s ‘Nonlinear’ MRA procedure to calibrate theircost structure using sales data in whatcan be called a fully ‘market calibratedcost model.’” (Gloudemans and Nelson2005, 2 [Abstract])As the technology for using computer-assisted mass appraisal matured,statistical standards were introduced tomeasure the quality of CAMA-producedvalues. An excellent example of theseimprovements is described in ThomasHamilton’s 1997 dissertation submittedat the University of Wisconsin. His workaddresses the technical aspects of howsales samples may not properly representthe property population leading to valueestimation problems. His paper presentsJournal of Property Tax Assessment & Administration Volume 3, Issue 145

his findings on how market value estimates can be improved by using a newlydefined least squares estimation technique with distance metrics as weightingfactors (Hamilton 1997). The dissertation confirms the advancements madesince 1970 and the continuing researchbeing done to improve the CAMA-basedassessment process.IAAO recently published a comprehensive standard on automated valuationmodels, which contains useful descriptiveinformation about CAMA models and theautomated appraisal process:An automated valuation model (AVM)is a mathematically based computer software program that produces an estimateof market value based on market analysisof location, market conditions, and realestate characteristics from informationthat was previously and separately collected. The distinguishing feature of anAVM is that it is an estimate of marketvalue produced through mathematicalmodeling. Credibility of an AVM is dependent on the data used and the skillsof the modeler producing the AVM. Thedevelopment of an AVM is an exercisein the application of mass appraisalprinciples and techniques, in which dataare analyzed for a sample of properties todevelop a model that can be applied tosimilar properties of the same type in thesame market area. AVMs are characterized by the use and application ofstatistical and mathematical techniques.This distinguishes them from traditionalappraisal methods in which an appraiserphysically inspects properties and reliesmore on experience and judgment toanalyze real estate data and develop anestimate of market value. Provided thatthe analysis is sound and consistent withaccepted appraisal theory, an advantageto AVMs is the objectivity and efficiencyof the resulting value estimates. (IAAO2003, 5–6)Even though a large body of literatureexists on the subject of mass appraisaland the importance of accuracy in the46application of CAMA AVM models, therewas not a single paper that reported onthe proposed topic of this research—theevaluation of the relative performance ofthe primary CAMA methodologies usedthroughout the world.MethodThe primary purpose of this controlledexperiment was to compare the performance of the automated valuationmodels used in computer-assisted massappraisal. It was not intended to beeducational in the use of the techniquesthemselves, as was Ward and Steiner’s1988 research. Since equitable propertytaxation depends upon having underlying value assessments that are as accurateas possible, an important question toanswer is whether any one of the methods produces statistically more accurateresults than the others when appliedunder the same conditions. Professionalappraisers must perform their work inconformance with the Uniform Standardsof Professional Appraisal Practice (USPAP).In particular, mass appraisal work must beconducted according to Standard 6 (Appraisal Foundation 2003). The quality ofassessment work is measured in terms ofuniform treatment of every property toensure the highest degree of equity andfairness for individual property owners.Most state oversight organizations, suchas the Oregon Department of Revenue,have established standards for measuringassessment quality and performance (Oregon Department of Revenue 2004).The widely accepted measure of qualityin the tax assessment field is the coefficient of dispersion (COD) about themedian of assessment/sale ratios of a salessample. Gloudemans has done extensiveresearch into the COD statistic and his2001 paper provides a useful discussionof confidence intervals for the coefficientof dispersion (Gloudemans 2001).To have assessments that exhibituniformity, the practitioner wants the“scatter” of individual assessments (A)compared to their actual sale transactionJournal of Property Tax Assessment & Administration Volume 3, Issue 1

amounts (S) when they subsequently sellin the market (the A/S ratios) to approximate a normal distribution about themedian of the A/S ratios for the entiresales sample and to be as small as possible,as measured by the COD. Therefore, thetest statistic for AVM performance usedfor the four mass appraisal methodologiesapplied in this research is the COD meandifference.The null hypothesis is stated as:H 0 : µCOD MRA µCOD AEP µCOD TCM µCODCOST , where H0 the null hypothesis, and μCODMRA the population meancoefficient for the multiple regressionanalysis (MRA); µCODAEP the population mean coefficient for the adaptiveestimation procedure (AEP); µCODTCM the population mean coefficientfor transportable cost-specified market (TCM) approach; µCODCOST thepopulation mean coefficient for the costapproach (COST).The null hypothesis is that the µCODswill all be the same, that is, not significantly influenced by the choice ofmethod. The research hypothesis isthat the selection of method will causethe µCODs to not all be the same, withmethods producing a significantly different COD mean at p 0.05. The researchhypothesis is stated as H a : µCOD MRA ,µCODAEP , µCODTCM , and µCODCOST arenot all equal. The research hypothesisfurther states that when properly appliedby knowledgeable appraisers, the fourCAMA methods analyzed in this experiment yield value results with some CODmean differences that are statisticallysignificant at p 0.05.To measure the predictive accuracy ofthe four different treatments (automatedvaluation modeling methods), all testswere conducted using the same population and the same random sample drawnfrom that population. Some records thateither had missing data or did not belong in a test of single family residences,such as duplexes and vacant properties,were eliminated prior to distribution toparticipants. The population, obtainedfrom a Midwestern assessing jurisdiction,included 22,785 existing single family residential properties with their descriptivecharacteristics, representing 52 distinctneighborhoods, which was a subset ofrandomly drawn neighborhoods from theentire jurisdiction. A “neighborhood” isa market area with homogeneous properties and similar economic influences.Neighborhood serves as a location variable for the jurisdiction. (See the OregonSales Ratio Manual [Oregon Departmentof Revenue 2004] for a more detaileddescription of sales sampling, sale validity,and market areas.)The test sample consisted of the 1,299properties in the population that sold in2004. These sales had been screened bythe assessing staff to verify that they werearm’s-length market transactions. Thisdiffers somewhat from generally acceptedmodel-testing methodology in that a portion of the model-building sales sample(1999–2003 sales) was not set aside fortesting but 2004 sales were used instead.For example, in the Allegheny Countytest, 3,306 sale parcels were selected fromthe years 1974, 1975, and 1976 with 25%(779) placed in the “set aside” controlgroup for testing, leaving 2,527 for theexperimental model-building group(Carbone, Ivory, and Longini 1980, 164).Ward used a total of 700 sale parcels from1985 and 1986, with 500 parcels for modeldevelopment and a control sample of 200from the same years for model testing(Ward and Steiner 1988, 45).The justification for using the following year’s valid market sales as thecontrol group was that it more closelyresembled the reality faced by assessorseach year. Also, it could possibly uncoverinstability in the models when attemptingto predict future sale prices, rather thanpredicting the sale prices of a controlgroup drawn from the model-buildingsample. This decision was influencedin part by Hamilton’s research and thedesire to consider a “worst case” scenarioin sales sample selection.In summary, from a population of 22,785Journal of Property Tax Assessment & Administration Volume 3, Issue 147

parcels from the period 1999–2003, a totalof 5,546 jurisdiction-validated sales, withcharacteristics as they were at the time ofthe sale, were available for use in modeldevelopment. Each modeler was free touse as many or as few of these historicalsales as desired. Once their models wereconstructed, they were used to blindlyestimate the selling prices of the 1,299jurisdiction-validated 2004 sales. All 1,299sales were used for testing the resultantvalue predictions, that is, no outliers wereeliminated. None of the participants hadinformation on current or prior assessedvalues for any of the parcels includingthe 5,546 available for model building.They did not know the jurisdiction fromwhich the data had been extracted, andthey did not know who the other participants were.An observation was defined as the ratioproduced by dividing the predicted saleprice by the actual price for each of the1,299 sold properties in the population.The test statistic was defined as the coefficient of dispersion (COD) obtained fromthe observations of one participant in theexperiment, that is, the average percentage deviation about the median ratios ofthe observations for that participant. Therandomness of the sample was ensured bythe random activity characteristic of thereal estate market. Although Hamilton(1997) states that a sales sample createdthrough random market activity may notbe fully representative of the populationfor various reasons, this was not considered a factor in the current researchbecause it was assumed that any population representation errors would impactall the participants equally and not affectthe relative difference of the CODs of theparticipants and the test outcome.The assessed values set by the jurisdiction on December 31, 2003, for the sold2004 properties were included as a TCMparticipant since they had been established prior to the actual sale dates ofthe 2004 test parcels. After reviewing theinitial research report, one participantsuggested that this may not be valid, so48the assessed values were removed fromone set of results.Since the cost approach involves careful application of the costing procedureto the property characteristic data in acookbook-like process without any modeling activity (the model and coefficientsare pre-specified), the cost estimates forthe experiment were calculated by theauthor using two different AVMs basedupon Marshall & Swift cost data (2003).One AVM was based upon Section A ofthe September 2003 Marshall & SwiftResidential Cost Handbook, implementedusing a large Microsoft Excel spreadsheet. The notes and assumptions usedfor this spreadsheet implementation aresummarized in Appendix A. The otherwas developed using the ProVal softwarecost approach with a mass appraisal costing AVM that uses floor level calculationscreated from Section B (SegregatedCost) and Section C (Unit-in-Place Cost)of the same September 2003 Marshall &Swift Residential Cost Handbook. Neithercost-based value prediction method usedany market adjustments for location,house style, or other such factors.The ExperimentThe first phase consisted of recruitinghighly qualified participants for theexperiment. Potential differences inmodeling skill among participants represented an area of uncertainty. As theIAAO Standard on Automated ValuationModels states in its discussion of MRAmodel specification and calibration,“The availability of data will influencethe specification of the model and mayindicate the need for revisions in thespecification and/or limit the usefulness of the resulting value estimates”(IAAO 2003, 8), and “No one softwarepackage is deemed superior to another,as success using MRA is a combinationof modeling skills and software familiarity.” (IAAO 2003, 12) Therefore, onlyqualified, experienced modelers wereinvited to participate. Among them werecontributors and reviewers of the IAAOJournal of Property Tax Assessment & Administration Volume 3, Issue 1

AVM standard (2003, 2). The practitioners who participated in the researchwere as follows. Their home states areprovided in parentheses.Fred Barker (Oregon)Russ Beaudoin (Vermont)Sue Cunningham (Virginia)Bob Gloudemans (Arizona)Richard Horn (Iowa)Michael Ireland (Illinois)Ron Schultz (Florida)Russ Thimgan (Arizona)Michael Whitted/Char Cuthbertsonas a team (Florida/Indiana)In discussing potential time commitments, it was agreed that no participantshould spend more than 24 hours on theresearch project.The second phase of the experimentinvolved extracting and organizing datafiles for distribution to participants.The six AEP and MRA model-buildingparticipants were provided with the 40data items listed in table 1 for the 5,546sales. These were extracted from thejurisdiction’s SQL Server market database and placed into Microsoft Excelspreadsheets. A spreadsheet with thesame layout but without sales information was provided for the 1,299 saleparcels from 2004 that comprised thecontrol group to be valued for the test.Each AEP and MRA model builder wasencouraged to use their preferred modeling software.All participants were supplied withthe jurisdiction’s established land valuesas of December 31, 2003 (table 1, field36), and were instructed to use themas a “given.” No data was provided forcomputing new land values. Correct landvalues are a prerequisite for the cost approach, whereas land is not as importantin the market approach since it is basedon total property value.For the participants using the transportable cost-specified market (TCM)methodology, a backup of the SQL Serverdatabase used for the ProVal softwarecost approach calculation was suppliedwith all 2004 sales information removed,all assessment information removed, andjurisdictional identity removed. Althoughthe test was blind for all participants, thethree who used TCM started from anexisting model specification since theydid have the cost approach AVM that wasused to produce one set of cost-based predictions. Their task was to use the samesales information from 2003 and earlierthat was available to the AEP and MRAmodelers and add two market variables:the neighborhood number (table 1, field3) as a variable for location, and thehouse type code (table 1, field 17) as avariable for house type or style. They thenwere to use the standard analysis toolsavailable within the software product tocalibrate the cost approach values to themarket using only these two additionalvariables. They did not use AEP or MRAtools for market calibration, but had atransportable version of these tools beenavailable, their results probably couldhave been improved.To summarize, the six AEP and MRAparticipants had to build (specify) predictive models using their respectiveanalytical tools and then calibrate (fit)them to the time-trended sales samplefrom 2003 and earlier, using their owntrending technique and judgment as tothe age of the sales that should reasonably be used. They then applied theirrespective models to the 1,299 propertiesin the test group to estimate 2004 selling prices. The three TCM participantshad to use a cost-specified AVM as theirstarting point and then apply two additional market variables before using thestandard analysis tools in the software,including its sales trending capability,to estimate selling prices for the 1,299properties in the 2004 test group. Twosets of cost calculation results for the1,299 properties in the test group werefurnished by the author based upon twodifferent cost AVM model specificationsusing Marshall & Swift cost data fromSeptember 2003. Finally, in order to haveone other interesting perspective, thejurisdiction’s statistics for the 2004 testJournal of Property Tax Assessment & Administration Volume 3, Issue 149

Table 1. Parcel variablesField ictSaleDateSaleAmts1s2AcresTLA 122232425262728293031323334353637383940H TypeB SFF BathsH BathsTot FixAttGar SFGar CapDetG SFC chSFWdDkSFLand CostRoofMatAtticSFAtticFinSFExt Cov50DescriptionParcel identifier, numeric, ranging from 16 to 52100. (Note: Parcel Identifiers in theparcel population range from 3881 to 91011462 and do not have the same PINs as thehistorical sales data sample).Property class - all are residential, single family class 510Neighborhood number, 3-digit numeric, range 108 to 579 (52 total)Tax district number, 6-digit numericSale date in a single date field with the format ‘mm/dd/yyyy’ (total 5,546)Sale amount; range 17,400–1,823,000; median 139,900; mean 168,274Sale validity code for state reportingSale validity code for arm’s-length market transaction, ‘V’ validParcel acreage where availableTotal finished living area square feetFinished living area square feet–basementFinished living area square feet–1st floorFinished living area square feet–full 2nd floorFinished living area square feet–partial upper floor such as half storyFinished living area square feet–lower level of split or bi-level (split foyer)Story height as a single numeric field; 100 1 story, 150 1½ story, etc.House type code, numeric, where 12 old 1 or 1.5 story, 22 older 2 story, 42 newer1 story, 52 newer 1.5 story, 62 newer 2 story, 71 split foyer bi-level, 80 split levelBasement square feet (no basement 0)Number of full bathsNumber of half bathsNumber of total plumbing fixturesAttached garage size in square feet (no attached garage 0)Attached garage car capacity (not always available)Detached garage size in square feet (no detached garage 0)Central air-conditioning (Y or N)Number of fireplacesYear constructedEffective year built–proxy for effective ageCondition: 94% AV, 1% EX, 1.5% F, 2% G, 1% VG, 0.1% PQuality grade, numeric, ranging from 25 to 95 with 45 avg, 25 poorExtra features flag, where 1 yesFree form description of extra featuresAmount of value assigned to the extra features by the appraisal officeTotal square feet of porch areaTotal square feet of wood deck areaEstimated market land value placed on the lot by the appraisal office prior to time of saleRoof cover material codeTotal square feet of attic areaFinished living area square feet in the atticExterior cover material codeJournal of Property Tax Assessment & Administration Volume 3, Issue 1

group were included using their actualassessed values as of December 31, 2003.Based upon the jurisdiction’s CAMAmethodology, it would be considered aTCM participant. (The jurisdiction’s figures were later removed from one resultset at the suggestion of a participant).Thus, 12 distinct sets of 1,299 sellingprice predictions drawn from 15,588 individual observations were available foranalysis. This process of estimating the2004 selling prices of the test group, asperformed by all participants, simulatesthe annual revaluation process that assessors must follow in order to establishassessed values for use in property taxation as of January 1 (or other statutorytax lien date) each year.Phase 3 of the experiment involvedprocessing each of the 12 distinct sets of1,299 selling price predictions throughexactly the same sales analysis process.Each set of values was extracted from itsreturn source (Excel spreadsheet, textfile, or SQL Server database backup) andplaced in a standard import format forsales analysis. Prior to the sales analysisprocessing, the 1,299 test group was carefully reviewed one last time to ensure thatno problems existed with the data. Theonly potential problem found was thatsix of the properties had sold twice in2004. Since the jurisdiction had markedthese as valid sales, it was determined thatboth sales should be included, resultingin 1,305 actual ratios being calculated foreach test group. The median A/S ratio,price related differential (PRD), and coefficient of dispersion (COD) for each o

Performance Comparison of Automated Valuation Models BY J. WAYNE MOORE This article is based on a presentation given September 21, 2005, at the IAAO 71st Annual International Conference on Assessment Administration in Anchorage, Alaska. J. Wayne Moore, in his 32-year career, has been involved directly or indirectly in implementing