This article evaluates the new automated

Using Geographic-attribute Weighted Regression for CAMA Modeling BY J. WAYNE MOORE, PH.D., AND JOSHUA MYERS The paper on which this article is based was presented September 1, 2010, at the International Association of Assessing Officers 76th Annual International Conference on Assessment Administration in Orlando, Florida. It expands upon the paper presented in Little Rock, Arkansas, on March 9, 2010, at the 2010 GIS/CAMA Technologies Conference sponsored by IAAO and the Urban and Regional Information Systems Association (URISA). This article evaluates the new automated valuation model (AVM), geographically weighted regression (GWR), which incorporates a geospatial factor, the parcel s x-y coordinates obtained from the geographic information system (GIS), in the model specification. It also introduces and evaluates a variant of GWR called geographic-attribute weighted regression (GAWR), which incorporates the parcel s similarity to surrounding parcels as well as its location. The study used an existing research dataset (Moore 2006) to compare value estimates obtained through incorporation of parcel x-y coordinates in the AVM model specification with estimates produced by commonly used AVMs that do not include a geospatial factor. The findings were extended and validated by applying the new methodology to datasets from the City of Norfolk and Fairfax County, Virginia. The study was conducted using a rigorous experimental design with statistical hypothesis testing, frequently missing in papers reporting comparison of results produced from various AVMs. In the past two decades, much attention has been focused upon integrating geographic information systems and computer-assisted mass appraisal (CAMA). An important question deserving research attention is the relative importance of each parcel s x-y coordinates in the specification of the AVM. When x-y coordinates are not available, a neighborhood adjustment factor variable must apply a value uniformly across each specifically delineated neighborhood, potentially resulting in sharp changes in value estimates at neighborhood boundaries and less uniformity. Theoretically, a model that incorporates the exact location of a given parcel should produce more accurate value estimates than a model that does not, but would a statistically significant J. Wayne Moore, Ph.D., is an independent researcher. For more than three decades, he developed property appraisal software solutions as well as participated either directly or indirectly in the implementation of CAMA systems in more than 300 assessment jurisdictions. Joshua Myers is Real Estate CAMA Modeler Analyst with the City of Norfolk, Virginia. He holds a master of science in statistics from the University of Virginia. Journal of Property Tax Assessment & Administration Volume 7, Issue 3 5

difference exist in the real world? This research offers empirical evidence in answering the question. Research seeks to answer questions, but in the process, new questions are discovered that also need answers. Recently published and potentially applicable literature in other disciplines has been studied and evaluated from the perspective of potential application in property appraisal. Much work has been done outside of the appraisal field by scholars interested in predicting, estimating, and forecasting data points, including home transaction prices; their research may have direct application to the appraisal process. One such methodology, geographically weighted regression, is the primary subject of this research. Research Questions and Definitions Assessors have a variety of tools available for performing their duty of accurately and uniformly estimating the market value of properties within their jurisdictions. The accuracy with which they estimate market value directly affects tax equity among property owners. Because assessors have a variety of market value estimating methodologies from which to choose, as well as limited resources available for performing their duties, they need access to objective, independent information and comparative analyses of the relative performance of the available automated valuation models to ascertain best practice. The research questions developed to address this need were: 1. To what extent do measures of equity/accuracy differ among available methodologies for estimating the market value of singlefamily homes? 2. Does the use of parcel coordinates from GIS in the market value estimating model specification improve the measure of equity/accuracy by a statistically significant amount? 3. Does the addition of attribute weighting improve the measure of equity/accuracy of the market value estimating model by a statistically significant amount over standard GWR? Key Metrics Following are descriptions of the key measurements evaluated in this research. Measure of equity is an objective statistic designed to provide an indication of property tax equity and assessment accuracy. The measures for this research were coefficient of dispersion (COD) for horizontal equity and quintile mean ratio (QMR) for vertical equity. Horizontal inequities are differences in effective tax rates among properties having similar market values in the same or similar neighborhoods based upon the relationship between actual tax dollar amount levied upon each property to its market value. Horizontal inequities are primarily measured by the coefficient of dispersion. Vertical inequities are differences in effective tax rates between groups of properties based upon their relative value ranges. For example, if higher-priced properties as a group have a different effective tax rate than lower-priced properties as a group, the condition of vertical inequity exists. Vertical inequity will influence the value of the COD, but additional measures are required to obtain a more precise evaluation. Coefficient of dispersion (COD) is the average absolute deviation of calculated sale ratios from their median expressed as a percentage of the median (IAAO 1997, 26). Larger COD values indicate diminished uniformity. Quintile mean ratio (QMR) is the average of the appraised value to sale price ratios in each one-fifth grouping of the ratios being investigated after the ratios have been sorted from lowest sale price to highest sale price and divided into five equal sale price groups. 6 Journal of Property Tax Assessment & Administration Volume 7, Issue 3

Vertical equity index (VEI) is the absolute value of the difference between the highest and lowest of the five quintile mean ratios within a study group divided by the mean of the five QMRs, and then multiplied by 100 (Moore 2008). Lower VEI values indicate better vertical equity. Hedonic Models, Econometric Models, AVMs, and GWR The term model as it relates to appraisal is defined in the Glossary of Property Appraisal and Assessment (IAAO 1997) as a representation (in words or an equation) that explains the relationship between value or estimated sale price and variables representing factors of supply and demand. (p. 88) In a recent article in the Journal of Property Tax Assessment & Administration, the relationships that exist between the economic supply and demand functions and the general assessment model were explained in detail (Moore 2009). Court (1939) has been credited with originating the idea of the hedonic price estimating model, but the concept actually originated earlier in the form of scientific appraising in the work of Zangerle (1924); Pollock and Scholz (1926); and Prouty, Collins, and Prouty (1930). Without using the term hedonic, Jensen (1931) defined a constructive market value as one constructed synthetically by taking all the factors affecting value into account so that it shall approximate as closely as possible what the market value would be could one be ascertained. (p. 450) The scientific appraisal methodology and constructive market value that Jensen described amounted to what is now known as the cost method. Hedonic models decompose the price of an item into separate components that determine the total price when summed. A familiar application is the cost method. This use is among the more complex, but effectiveness is not synonymous with complexity. All models used in the appraisal of real estate are hedonic, the scholarly term for this broad category of models. Another term found in the literature is econometric model. According to Kennedy (1998), a generally accepted definition does not exist. He offered, however, that econometricians are theoretical statisticians, applying their skills to the development of statistical techniques appropriate to the empirical problems characterizing the science of economics. (p. 1) Many models that are based on the generalized linear model in statistics, such as various regression models used in appraisal, are types of econometric models, which also are part of the broader classification of hedonic models. Automated valuation models (AVMs) comprise a more narrow set of econometric models used specifically to estimate property values and selling prices. Computer-assisted mass appraisal (CAMA) is the term applied to computer software that incorporates AVMs and is used by assessors to assist in managing and performing their property valuation duties. The more common AVMs used in CAMA systems are the traditional cost method, comparable sales method, multiple regression analysis, adaptive estimating procedure (also referred to as feedback), and the transportable cost-specified market (also called market-calibrated cost). An earlier study (Moore 2006) evaluated the performance of four of these AVMs without considering spatially dependent variables beyond the neighborhood variable. That study recommended future research to evaluate the impact of adding a precise geospatial factor to the same characteristics dataset to determine whether a statistically significant improvement in equity would result. The current study evaluates a new AVM, geographically weighted regression, which incorporates a parcel s x-y coordinates obtained from GIS. It also introduces and tests a potential variant of GWR, geographic-attribute weighted regression, which incorporates both a parcel s x-y coordinates and its similarity to surrounding parcels. Journal of Property Tax Assessment & Administration Volume 7, Issue 3 7

Literature Review and Description of GWR Comparing Performance of AVMs This research is an extension of the research reported by Moore (2006) and uses the same data as that study. The purpose of the 2006 study was to compare the performance of the primary automated valuation models used in computer-assisted mass appraisal. The models were tested in a controlled experiment in which nine experienced modelers with access to the same sales dataset constructed models to predict the next year s sales prices. From a population of 22,785 parcels, a total of 5,546 jurisdiction-validated sales from the period 1999 2003, including characteristics as they were at the time of the sale, were made available to participants for use in model development. Each modeler was free to choose as many or as few of the historical sales as desired and to use their favorite software. Once constructed, their models were used to blindly estimate the selling prices of the 1,299 jurisdictionvalidated 2004 sales as an out-of-sample test. All 1,299 sales were included in testing the resultant value predictions; that is, no outliers were eliminated. None of the participants had information on current or prior assessed values for any of the parcels including the 5,546 available for model building. They did not know the jurisdiction from which the data were extracted, and they did not know the identity of the other participants. The process of estimating the 2004 selling prices as the test group, instead of using a portion of the sales held out from the 1999 2003 time period, simulated the annual revaluation process that assessors must follow to establish assessed values for use in property taxation as of the statutory tax lien date each year. Thus, the test would allow a realistic evaluation of the predictive power of the AVMs. The 2006 study tested four automated valuation model types most commonly used in mass appraisal: adaptive estimation procedure (AEP), multiple regression analysis including non-linear regression (MRA), the traditional cost method (COST), and a hybrid transportable cost-specified market method (TCM). The dependent variable was the COD that resulted from applying each AVM to predict the selling prices of the same set of 1,299 out-of-sample properties in the 2004 test group. A oneway analysis of variance (ANOVA) was conducted to evaluate the null hypothesis that no differences in market value estimating accuracy existed among these major AVM methods and to analyze the relationship between AVM type chosen and the resulting COD. The study results provided clear statistical evidence to support what most CAMA practitioners already believed to be true: a market-calibrated AVM will predict selling prices more accurately than a purely cost-based AVM. The three market-based AVMs (AEP, MRA, and TCM) produced statistically equivalent mean CODs near 10, whereas the purely cost-based AVM produced a mean COD close to 15. The research also provided a baseline for pursuit of additional research questions, including the question considered by this research: Does the addition of a parcel s x-y coordinates improve the value estimates of an automated valuation model by a statistically significant amount? This question was tested in the current study using a new, enhanced form of geographically weighted regression geographic-attribute weighted regression as well as a standard GWR model. Multiple Regression Analysis In the context of automated valuation models, the common regression techniques are linear and non-linear multiple regression analysis with neighborhood adjustments done by use of dummy variables. Multiple regression analysis was first used for property valuation purposes by C.G. Haas in 1922(Haas 8 Journal of Property Tax Assessment & Administration Volume 7, Issue 3

1922). The computational demands of this method were high, but support for the method increased after World War II when the power of the computer started to be realized (Gipe 1975). These common regression techniques may be fundamentally flawed, however, because they fail to adequately take into account spatial autocorrelation, the coincidence of selling price with location, and spatial heterogeneity, the changing value of property attributes across a study region (Yu, Wei, and Wu 2007). Spatial autocorrelation violates the fundamental regression assumption of independence of observations, and spatial heterogeneity violates the assumption that the housing market acts in relative equilibrium. Neighborhood dummy variables do account for some of the effects of location, but their use can create a boundary value problem. The result is that two properties on opposite sides of the boundary between two neighborhoods may be given very different adjustments. In actuality, in most jurisdictions, values of property characteristics vary smoothly across space and are not constant even throughout an entire neighborhood. In MRA, AEP, and TCM, the process of making adjustments by neighborhood is a somewhat crude method of accounting for spatial effects, but it is often all that local assessors have to use. What is really needed is a smoother way to account for the effects of location throughout a jurisdiction and to follow the first law of geography which states that things that are closer together tend to be more alike than things that are farther apart (Tobler 1970). With the advent of GIS, it seems appropriate to directly incorporate location when using regression to estimate property value. Geographically Weighted Regression Geographically weighted regression is a nonparametric regression modeling technique that incorporates the use of GIS parcel centroid coordinates (Brunsdon, Fotheringham, and Charlton 1996; McMillen 1996). At its heart, geographically weighted regression is a special case of the locally weighted regression (LWR) model (McMillen and Redfearn 2010). Locally weighted regression was introduced in a paper by Cleveland (1979), and the method was further developed by Cleveland and Devlin (1988). GWR essentially takes a method that previously was only applied to a variable space and applies it to a geographic space. Use of GWR has grown over the past 10 years, and it has been implemented in nearly every major field of academic research (Matthews 2007). There are two different ways to use GWR. One is as an exploratory tool to understand the varying tastes and preferences for different property attributes across a jurisdiction. This type of analysis is more useful to researchers than to property assessors. Another way is as a statistical technique to better estimate the value of a subject property with a given set of attributes by better taking into account the effects of location. This is the application of GWR in the assessment context and is the impetus for this research. Use of GWR for the purpose of property value estimation has already been studied extensively using housing data from throughout the United States and Canada. For example, Yu, Wei, and Wu (2007) worked with data from Milwaukee, Wisconsin; Paez, Long, and Farber (2008) took data from Toronto, Ontario; Des Rosiers and Theriault (2008) used data from Montreal, Quebec; and Borst and McCluskey (2008) applied data from Sarasota County, Florida, Fairfax County, Virginia, and Catawba County, North Carolina. Yet, to our knowledge, there has not been an implementation of the model inside the walls of an actual appraisal jurisdiction. Also, there has not yet been a controlled experiment comparing the GWR model to other models commonly used in the assessment com- Journal of Property Tax Assessment & Administration Volume 7, Issue 3 9

munity. This article reports the first such comparison of the GWR model to the other AVMs. MRA is the foundation of GWR, but unlike MRA, which has one model-wide set of regression coefficients, GWR produces a different set of regression coefficients for every property by running a series of weighted least squares regressions. The weighting is determined by the distance of the subject property from its nearest neighbors. Therefore, GWR is essentially the combination of many small weighted MRAs that are performed around each subject property. This process makes each set of regression coefficients in GWR a function of location. The following model formula for GWR looks similar to the one for MRA: p Y i = β 0 (x i, y i ) + β h (x i, y i )X ih + є i, h=1 where p is the number of independent variables, (x i,y i ) denotes the coordinates of the i-th subject property, β h (x i,y i ) is the regression coefficient for the h-th independent variable of the i-th subject property, Y i is the sale price for the i-th subject property, X ih is the h-th independent variable for the i-th subject property, and є i is the random error term for the i-th subject property with distribution N(0,σ 2 I) (Brunsdon, Fotheringham, and Charlton 2002). Here, the local regression coefficients for the i-th subject property are estimated as: βˆ i = (X T W i X) 1 X T W i Y, where X is the matrix of independent variables, W i is the diagonal weights matrix for the i-th subject property, and Y is the vector of sale prices. The diagonal weights matrix in each weighted least squares regression is determined by a choice of weight function in which nearby properties are afforded more weight than properties that are farther away. This choice of weighting is supported by the first law of geography (Tobler 1970). Here, the weight function is applied to all sale properties in a certain sliding neighborhood around each subject property. The size of the sliding neighborhood is called the bandwidth. The literature contains two types of bandwidths fixed and adaptive (Brunsdon, Fotheringham, and Charlton 2002). A fixed bandwidth model, effectively, includes in each local regression all of the sale properties within a fixed distance from each subject property. An adaptive bandwidth model includes in each local regression a specific number of sale properties around each subject property by allowing the size of the sliding neighborhood to vary. A fixed bandwidth model potentially could have more difficulty than an adaptive bandwidth model in dealing with irregularly spaced data, like home sales, because the fixed bandwidth model could have some local regressions that are primarily based on only a small number of sale properties (Brunsdon, Fotheringham, and Charlton 2002). To address this concern, only adaptive bandwidth models are used in this research. One common choice of adaptive weighting scheme, and the one used in this research, is the bi-square function given as: w ij = (1 (d ij /b) 2 ) 2, d ij b w ij = 0, otherwise where d ij is geographic distance between the i-th subject property and its j-th neighboring sale property, and b is equal to the bandwidth (Brunsdon, Fotheringham, and Charlton 2002). Spatial-attribute Weighting Function The standard set of possible weighting functions for GWR takes into account only the geographical distance between properties, not the similarity of their attributes. It makes sense though that the similarity of the sale property to the subject property would also be important. A modification of the previously given weight function to account for 10 Journal of Property Tax Assessment & Administration Volume 7, Issue 3

differences in attributes could be stated as follows: w ij = ((1 (d ij /b) 2 ) f(τ)) 2, d ij b w ij = 0, otherwise f(τ) = e 1 (Aj /Ai) where f(τ) is an exponential function that changes the weight according to the difference (τ) between attributes of the i-th subject property, A i, and its j-th neighboring sale property, A j (Shi, Zhang, and Liu 2006). This weighting function is now transformed into a spatial-attribute weighting function for use in the series of locally weighted least squares regressions that comprise the nonparametric GWR model. We named this method geographic-attribute weighted regression (GAWR) to differentiate it from GWR models containing only a distance-based weighting function. The principal difference between GWR and GAWR can best be illustrated through a practical example using three properties: a subject property, a smaller sale property two doors to the left of the subject property, and an exact replica of the subject property that sold two doors to the right. Under GWR, the smaller sale property would be given about the same weight as the exact replica sale property; however, under GAWR, the replica would be given decidedly more weight than the smaller property. One of the research questions investigated was whether GAWR s performance is statistically significantly better than GWR s in the estimation of property value. The sales comparison method is the preferred method of assessment for single-family residential properties (IAAO 2008). GAWR is very similar to the classic sales comparison method. In sales comparison, an appraiser looks for similar sale properties in the vicinity of the subject property, not just any property nearby. The same is true of GAWR. This capability adds a level of appeal to GAWR that is absent in many other AVM methodologies. Methodology This methodology section contains a restatement of the three research questions, the statement of hypotheses developed for separately testing horizontal and vertical equity, and a brief description of the research design. This section also explains the methodology used for random selection and assignment of sale parcels to the AVM test groups and contains a description of how the research was conducted. Research Questions Determining best practice when comparing estimating performance of available automated valuation models can be difficult for assessors without the availability of objective, independent information and comparative analyses. To address this problem, three questions were developed for this research: 1. To what extent do measures of equity/accuracy differ among available methodologies for estimating the market value of singlefamily homes? 2. Does inclusion of parcel coordinates from GIS in the market value estimating model specification improve the measure of equity/accuracy by a statistically significant amount? 3. Does the addition of attribute weighting to a geographically weighted regression improve the measure of equity/accuracy of the market value estimating model by a statistically significant amount over standard GWR? To seek answers to these research questions, seven test groups were considered, each representing the result set of a different market value estimating methodology. The methodologies tested were (1) adaptive estimating procedure (AEP), (2) traditional cost method (COST), (3) geographic-attribute weighted regression (GAWR), (4) geographically weighted Journal of Property Tax Assessment & Administration Volume 7, Issue 3 11

regression (GWR), (5) multiple regression analysis (MRA), (6) GAWR without using replacement cost new (NoRCN), and (7) transportable cost-specified market (TCM). Statement of Hypotheses Testing for horizontal equity and vertical equity requires two different models using the same data sample. The necessary operational variables for measuring both forms of equity require computation of the appraised value (AV) to sale price (SP) ratio (A/S). The COD is the average absolute deviation of all the ratios (A/S) in a sample from the median ratio. Multiple COD samples were selected and assigned for each test group and the mean COD (M COD ) of each group was used to test horizontal equity. The same randomly selected parcels were used for each test group, but the calculated selling price estimates for each group depended upon the AVM applied. The relative level of horizontal equity in each test group would be indicated by the magnitude of the test group s mean COD, with lower CODs indicating better equity. However, the mere fact that one test group exhibits a lower mean COD than another is not sufficient evidence that its equity is better. Only the existence of a statistically significant difference in the mean CODs among test groups would provide such evidence because the magnitude of the variance of the sample and group ratios also must be considered when attempting to draw a conclusion from the test group COD means. For that reason, an ANOVA model was selected for hypothesis testing. No statistically significant difference in the mean CODs among the test groups would imply equal value estimating performance and accuracy, even if the test groups had somewhat different mean CODs, as might be expected. Without an analysis of the variances involved, a somewhat lower or higher mean COD alone would not provide sufficient evidence for comparisons between group performances. Vertical equity is the construct used to reflect the uniformity of mean ratios (M A/S ) among different levels of parcel sale prices. Vertical equity was measured and analyzed by using five equal-count sale price range levels called quintiles. The ranges were determined by ranking from lowest to highest the sale prices from within the same sample that was used for testing horizontal equity and then dividing the sales into five groups with an equal number of sales in each. The quintile mean ratio (QMR), the average of the appraised value to sale price ratios (M A/S ) for each quintile, was calculated to test vertical equity. To imply vertical equity, all five groups should have approximately equal QMRs. Thus, the test for vertical equity was a comparison of the QMRs of the five quintile ranges within each AVM test group. For any test group, vertical equity would be achieved by the absence of statistically significant differences in the QMRs among its sample quintile ranges, indicating uniformity among vertical value strata. Hypothesis for Horizontal Equity Horizontal equity addresses the question of whether all homes are burdened with similar effective tax liabilities (actual taxes paid as a percentage of market value). Hence, an AVM s performance accuracy in estimating market value has a direct bearing on horizontal equity. The null hypothesis for testing horizontal equity performance among the value-estimating methodologies (AVMs) evaluated in this study was: H1 0 : There are no differences among M COD based upon the AVM chosen by the assessor, where M COD is the mean COD of each AVM test group. The alternative hypothesis was: H1 a : Differences exist in at least two M COD based upon the AVM chosen by the assessor. 12 Journal of Property Tax Assessment & Administration Volume 7, Issue 3

Hypothesis for Vertical Equity Vertical equity seeks uniformity in effective tax rates (actual taxes paid as a percentage of market value) among groups of properties based upon their relative value. If higher-priced properties as a group have different effective tax rates than lower-priced properties as a group, then the condition of vertical inequity exists property tax equity demands that all homes have the same effective tax rate. The performance accuracy in estimating market value for each AVM also has a direct bearing on vertical equity. The null hypothesis for testing vertical equity performance among the value-estimating methodologies (AVMs) evaluated in this study was: H2 0 : There are no differences in the quintile mean ratios (QMRs) within each AVM type available for use by the assessor. The alternative hypothesis was: H2 a : There are differences in the QMRs within at least one AVM type available for use by the assessor. Table 1 contains a summary of the characteristics of the hypotheses and models used for testing AVM performance. Description of Research Design A completely randomized design was utilized in the research for examining COD means as a test of horizontal equity and quintile mean ratios as a test of vertical equity. To test the horizontal equity hypothesis, an ANOVA model was constructed. If testing should indicate that, at the pre-specified significance level of α =.01, the COD means were not different, then no evidence would exist to reject the null hypothesis, indicating equal AVM performance. If testing should indicate significant main effects among the COD means for the selected α =.01 significance level, then evidence would exist that the null hypothesis should be rejected and further analysis would be required for pair-wise comparisons among the AVMs. Vertical equity was tested by further stratification of the appraised value to sale price ratios (A/S) of each test group into quintiles based on sale price. An analysis of variance was conducted using the quintile mean ratios of the five sale price range levels for each AVM type. This approach was a departure from some of the vertical equity testing literature (Allen and Dare 2002; Cornia and Slade 2005; Sirmans, Diskin, and Friday, 1995) that relied upon various regression techniques for examining vertical equity. Since there was no consensus among the cited references as to which regression model performed best for testing vertical equity, the ANOVA test that had been developed for use in recent research by Moore (2008) was used in this study as an alternative method for evaluating vertical equity. To provide a basis for comparison, price-related differential, the widely accepted standard measure of vertical equity (IAAO 2003), was computed along with the vertical equity index (VEI), which was developed for use in a recent study by Moore (2008). This step was prompted by the work of Jensen (2009), which reported that het- Table 1. Summary of model descriptions and hypotheses for testing AVM performance Item Horizontal equity Vertical equity Model type One-way ANOVA One-way ANOVA Independent variables AVM type Quintile within AVM type Replications per cell 31 155 Dependent variable M COD QMR General null hypothesis M COD s are not different QMRs are not different General alternate hypothesis Not all M COD s are the same Not all QMRs are the same Journal of Property Tax Assessment & Administration Volume 7, Issue 3 13

erogeneous variance in sale prices can reduce the reliability of the PRD. Power Analysis for Horizontal Equity Testing The total sample size required to provide adequate power at α =.01 was estimated through an accurate a priori power analysis for horizontal equity. The power analyses indicated that for a one-way ANOVA study, a cell sample size of 13 was necessary to achieve 99% power to detect differences among the COD means using an F test with α =.01 as the significance level. Hence, a sample size of at least 13 mean CODs for each AVM was estimated as required for the study of horizontal equity. Power Analysis for Vertical Equity Testing The total sample size required to provide adequate power at α =.01 was also estimated through a priori power analysis for vertical equity. Meaningful evaluation of vertical equity can only occur among quintiles of a single AVM. Hence, a singlefactorial power analysis was of particular interest because each AVM would need to be evaluated independently. Using a calculated effect size for analysis in G*Power (Buchner, Erdfelder, and Faul 1997), with equal size samples in five quintile groups, resulted in a total required sample size estimate of 655 at α = 0.01 to achieve 99% power. Thus, G*Power confirmed that the planned quintile cell size of 131 (655/5) or greater as estimated by the power analysis fulfilled the requirement for 99% power at α = 0.01 for vertical equity hypothesis testing. Total Sample Size The COD means (that is, the average of calculated CODs for the multiple samples in each AVM type) that were used for testing horizontal equity were second order statistics and each COD itself was calculated from a first order sample of parcels. Based on the a priori power analysis and other practical considerations, a decision was made to create 31 groups each containing 25 parcels randomly selected and randomly assigned to COD groups from the result set of each AVM type being tested. The mean COD of these 31 CODs was calculated for each AVM test group to test for horizontal equity. Each COD was computed from 25 A/S ratios, which required that 775 sale parcels be randomly selected and assigned from the value estimating result population of each AVM test group. The practical considerations driving the decision on sample size were: (1) a sample size of 30+ is widely used because simulation studies involving the central limit theorem have shown that the mean of samples of 30 or more from almost any type of distribution is approximately normal, (2) the sample size had to be divisible into 30 or more COD groups for horizontal equity testing and exactly five (quintile) groups for vertical equity testing while maintaining a balanced cell size for both tests, and (3) based on personal empirical experience with the calculation of medians and CODs for residential properties, as well as the precepts of the central limit theorem, 25 randomly selected parcel A/S ratios were deemed sufficient to derive one COD about each median appraised value to sale price ratio. Thus, a sample size of 31 CODs for each AVM would require 775 parcels (31 25) for the test of horizontal equity, which was well within the number suggested by power analysis for 99% power. At the same time, 775 parcel sale prices would provide a balanced cell size of 155 observations for vertical equity testing of the same AVM, which exceeds the minimum sample size of 131 estimated using power analysis. Using the ANOVA F test requires assumptions of normally distributed populations and equal variance among populations, which are rarely satisfied completely. Those assumptions were fully discussed and analyzed by Moore (2008) with respect to the use of ANOVA for horizontal and vertical equity testing with the same sample sizes as used in this study. 14 Journal of Property Tax Assessment & Administration Volume 7, Issue 3

Selection of Subject Parcels This research is an extension of the research reported by Moore (2006) and uses the same data as was used in that study. However, the methodology of the current study differs from the earlier study in that parcel centroid coordinates have been added, two additional market value estimating methods that incorporate parcel centroid coordinates in their model specification have been introduced, and replacement cost new (RCN) has been incorporated as an additional variable. This section reviews how sample data were selected and manipulated. The theoretical population for the study is all single-family homes in North America. The study population comprised single-family homes in an undisclosed Midwestern assessing jurisdiction. The sampling frame consisted of those homes that transferred ownership in valid arm slength transactions during the years 2001, 2002, 2003, and 2004. This sampling frame was identified and isolated through computer processing by Moore with permission of the jurisdiction. To measure the predictive power of the seven AVM methods, all tests were conducted using the same population and the same random sample of sale parcels drawn from that population. The population included 22,785 existing single-family residential properties with their descriptive characteristics, representing 52 distinct neighborhoods that were a subset of randomly drawn neighborhoods from the entire jurisdiction. In the earlier study, observations in the five years of 1999 2003 containing 5,546 jurisdiction-validated sales, with characteristics as they were at the time of the sale, were available for use by the participants for model specification and calibration. For this study, only 2001 2003 sales were used for model specification because parcel x-y coordinates were not available for the earlier years. To test the predictive power of models, a different sample than the one used for model specification and calibration is recommended (Clapp and O Connor 2008). Therefore, characteristics of 1,299 out-of-sample validated sales from the population that had sold in 2004 were provided to participants without the actual selling prices, which were known only to Moore. These sales had been screened by the assessing office staff to verify that they were arm s-length market transactions. Figure 1 contains a list of the parcel characteristic variables that were available for use in development and testing of the predictive models. For this study, variables 41 45 (parcel x-y coordinates and RCN) were appended to the dataset. For the years 2001 2003, there were 3,872 validated sales containing parcel coordinates for use in model specification and calibration. The jurisdiction s established land values as of December 31, 2003, were supplied as part of the characteristic data, and participants were instructed to use them as a given. No data were provided for computing new land values. Once the participant s models were specified, they were used to blindly estimate the selling prices of the 1,299 out-of-sample validated sale transactions in 2004. In the current study, Joshua Myers performed the model specification and calibration for the two spatially enhanced AVMs (GWR and GAWR). He was given the same information about the dataset and out-of-sample sale prices as participants in the prior study. Model Specification To summarize, the nine participants in the original study (Moore 2006) had to build (specify) predictive models using their respective analytical tools and then calibrate (fit) them to the time-trended sales sample from 2003 and earlier, using their own time trending technique and judgment as to what sales should be used for model specification and calibration. They then applied their respective models to the 1,299 out-of-sample properties in the 2004 test group to estimate selling prices. Cost calculation results for the Journal of Property Tax Assessment & Administration Volume 7, Issue 3 15

Figure 1. Parcel characteristic variables Field Name Description 1 ParcelNo Parcel identifier, numeric. 2 Class Property class all are residential, single family class 510 3 Neigh Neighborhood number, 3-digit numeric, range 108 to 579 (52 total) 4 District Tax district number, 6-digit numeric 5 SaleDate Sale date in a single date field with the format mm/dd/yyyy (total=5,546) 6 SaleAmt Sale amount; range 17,400 1,823,000; median 139,900; mean 168,274 7 s1 Sale validity code for state reporting 8 s2 Sale validity code for arm s-length market transaction, V = valid 9 Acres Parcel acreage where available 10 TLA_SF Total finished living area square feet 11 FinSFB Finished living area square feet basement 12 FinSF1 Finished living area square feet 1st floor 13 FinSF2 Finished living area square feet full 2nd floor 14 FinSFUp Finished living area square feet partial upper floor such as half story 15 FinSFLL Finished living area square feet lower level of split- or bi-level (split foyer) 16 Stories Story height as a single numeric field; 100 = 1 story, 150 = 1½ story, and so on 17 H_Type House type code 18 B_SF Basement square feet (no basement = 0) 19 F_Baths Number of full baths 20 H_Baths Number of half baths 21 Tot_Fix Number of total plumbing fixtures 22 AttGar_SF Attached garage size in square feet (no attached garage = 0) 23 Gar_Cap Attached garage car capacity (not always available) 24 DetG_SF Detached garage size in square feet (no detached garage = 0) 25 C_Air Central air-conditioning (Y or N) 26 FP Number of fireplaces 27 Year Year constructed 28 EffYear Effective year built proxy for effective age 29 Cond Condition: 94% = AV, 1% = EX, 1.5% = F, 2% = G, 1% = VG, 0.1% = P 30 Grade Quality grade, numeric, ranging from 25 to 95 with 45 = avg, 25 = poor 31 Extra Extra features flag, where 1 = yes 32 ExtraDesc Free form description of extra features 33 ExtraAmt Amount of value assigned to the extra features by the appraisal office 34 PorchSF Total square feet of porch area 35 WdDkSF Total square feet of wood deck area 36 Land Estimated market land value placed on the lot by the appraisal office prior to time of sale 37 RoofMat Roof cover material code 38 AtticSF Total square feet of attic area 39 AtticFinSF Finished living area square feet in attic 40 Ext_Cov Exterior cover material code 41 Latitude y-coordinate (adjusted by a constant to keep the real data location confidential) 42 Longitude x-coordinate (adjusted by a constant to keep the real data location confidential) 43 Era Transform of year built into five groups 44 Style Transform of house type code into five groups 45 RCN Replacement Cost New 16 Journal of Property Tax Assessment & Administration Volume 7, Issue 3

1,299 properties in the test group were furnished by Moore using Marshall & Swift s residential cost book data (Marshall & Swift 2003). The single best predictive result set for each AVM type from the original study (Moore 2006) was used to randomly select and assign 775 observations of 2004 parcel selling price estimates for the current study. Using the following year s out-ofsample valid sales for testing differs from the usual model-testing methodology that sets aside a portion of the modelbuilding sales sample for testing. The justification for using the following year s valid market sales as the test group was that it more closely resembles the reality faced by assessors each year. Also, this approach could possibly uncover instability in the models. This decision was influenced in part by the desire to consider a worst case scenario for testing the predictive power of models. Modeling Procedures for Geographicalattribute Weighted Regression To specify and calibrate the two new GWR and GAWR models utilizing parcel coordinates from GIS, it was necessary to determine three elements during the model-building phase: (1) the set of variables to include from the list of 45 (figure 1), (2) the number of years of sales history to use, and (3) the type of spatial-attribute weight function. A forward step-wise procedure was employed to determine the best set of variables to include in the model. At each step in this procedure, leave-one-out cross-validation was used on the sales-history dataset to determine the COD and optimum bandwidth for each of the variable options. The variable that most lowered the COD at each step was added to the model. At the end of the process, taking into account the principle of parsimony and the quality of the coefficients, the model with the best COD was chosen and its corresponding optimum bandwidth was recorded. An adaptive bandwidth was used in this study. Because the sales-history dataset stretched several years into the past, it had to be determined what set of years yielded the best results. Therefore, results were computed using various sets of years and the optimum historical time period selected. Also, the different choices for the spatial-attribute weight function had to be compared and the function that produced the lowest cross-validation COD identified. The full GAWR weight function used in this research was: w ij = ((1 (d ij /b) 2 ) e e 1 (Gradej/Gradei) e 1 (Landj/Landi) ) 2, d ij b w ij = 0, otherwise 1 (TLAj/TLAi) where Grade i, Land i, and TLA i are the grade, land value, and total living area respectively for the i-th subject property, and Grade j, Land j, and TLA j are the grade, land value, and total living area respectively for the j-th sale property. It was observed during this research that the set of variables selected to build a jurisdiction-wide multiple regression model cannot be applied as is in the context of GWR and GAWR and produce the same optimal results. Oftentimes, the best multiple regression model variable set performed decidedly worse when applied to GWR and GAWR. A step-by-step process of model building needs to be undertaken for these models as well. Additionally, it was observed that transformations of the sale price dependent variable that were efficacious in the multiple regression context can very well be punitive in the context of GWR/GAWR. The localization employed in GWR and GAWR seems to produce effects that are more linear and thus are not in need of transformation. Once all these processes and steps had been completed, the GWR and GAWR models were used to estimate the value of the subject out-of-sample properties. Inclusion of RCN and the Variable Set Replacement cost new is the basis of the cost method of property valuation. It Journal of Property Tax Assessment & Administration Volume 7, Issue 3 17

represents the current cost to construct a functionally equivalent replacement of a structure. Many property attributes are used in the determination of the RCN, especially the total living area and the construction quality grade. The cost method has one major advantage over many other valuation methods: it is easy to understand. Regression, on the other hand, is a mystery to many taxpayers and oftentimes is difficult for appraisers to explain. This aspect may be hindering the widespread adoption of regression in appraisal jurisdictions. In an effort to increase the explanatory power of GWR and GAWR and break down barriers to their adoption, RCN was included as one of the variables that were tested during the model-building process. Surprisingly, the addition of RCN proved advantageous in reducing the COD and so it was included in the final model for both the research dataset and the Norfolk dataset. This outcome is significant. It permits the results of GWR and GAWR models to be interpreted as a market adjustment to the construction cost, a concept that is easier for the average person to grasp, and should allow for easier adoption. However, inclusion of the reproduction cost may seem counterintuitive to some. After all, cost assigns one number to an improvement and does not take into account the relative differences among its various attributes. A large regression model may seem more appropriate because many factors are taken into account directly. This research demonstrated that such thinking was incorrect, at least in the context of GWR/ GAWR. RCN works as a variable because it constitutes a certain index of the property, a broad measure of what features a property contains, and, more importantly, a good measure of how a certain property compares to nearby properties. RCN may not be a very accurate means of estimating property value by itself, but it has great worth in the GAWR context. As a variable, it improves results, makes it possible to achieve a more parsimonious model specification, reduces the likelihood of the over-fitting hazard mentioned by Clapp and O Connor (2008), and provides a way to better explain a seemingly mysterious regression process. Table 2 lists the sets of variables chosen for the research data and for the Norfolk data. All of the models included an intercept. Even though the intercept can be difficult to interpret, for reasons beyond the scope of this article, it has an important role in improving the accuracy of the estimates. The land value estimate for the property was determined by an external process and provided as a data input for modeling. The Norfolk model included the land estimate as an offset. An offset is simply a value that is taken as a given and not set as a variable in the model. Also included as an offset in the Norfolk model was the total RCNLD (Reproduction Cost New Less Depreciation) for other improvements, such as pools, carports, piers, and utility sheds. The Norfolk dataset contained RCNLD instead of RCN because of the way individual appraisers finalize values there. For the GAWR Table 2. Variables selected for inclusion in GWR and GAWR models Research data Norfolk data Intercept Intercept RCN for the dwelling and RCNLD for the dwelling garage and garage Pre-determined land market Squared reverse month value of sale (Time) Total garage area (sum of Land value as an offset attached and detached garage area) Reverse month of sale (Time) RCNLD for other outlying improvements as an offset Total other area (sum of attic and basement area) Total living area Neighborhood indicator variable (coded 1 if the sale property is in the same neighborhood as the subject property, and 0 otherwise) 18 Journal of Property Tax Assessment & Administration Volume 7, Issue 3