An Alternative Hedonic Residential Property Price Index for Indonesia Using Big Data: The Case of Jakarta*

Similar documents
TECHNICAL ASSISTANCE REPORT RESIDENTIAL PROPERTY PRICE STATISTICS CAPACITY DEVELOPMENT MISSION. Copies of this report are available to the public from

Using Hedonics to Create Land and Structure Price Indexes for the Ottawa Condominium Market

Review of the Prices of Rents and Owner-occupied Houses in Japan

Real Estate Price Index Measurement: Availability, Importance, and New Developments

1 June FNB House Price Index - Real and Nominal Growth MAY FNB HOUSE PRICE INDEX FINDINGS

Frequently Asked Questions: Residential Property Price Index

Objectives of Housing Task Force: Some Background

Real Estate Prices Availability, Importance, and New Developments

Technical Description of the Freddie Mac House Price Index

What Factors Determine the Volume of Home Sales in Texas?

Regional Housing Trends

Analysis on Natural Vacancy Rate for Rental Apartment in Tokyo s 23 Wards Excluding the Bias from Newly Constructed Units using TAS Vacancy Index

Hedonic Regression Models for Tokyo Condominium Sales

University of Zürich, Switzerland

Separating the Age Effect from a Repeat Sales Index: Land and Structure Decomposition

Modelling a hedonic index for commercial properties in Berlin

House Price Indexes: Why Measurement Matters

An Introduction to RPX INTRODUCTION

How should we measure residential property prices to inform policy makers? 1

THE YIELD CURVE AS A LEADING INDICATOR ACROSS COUNTRIES AND TIME: THE EUROPEAN CASE

The Effect of Relative Size on Housing Values in Durham

Owner-Occupied Housing in the Norwegian HICP

How should we measure residential property prices to inform policy makers?

An Assessment of Current House Price Developments in Germany 1

The OeNB property market monitor of April 2015: Residential property price growth in Austria slowed down markedly in the second half of 2014

The hedonic house price index for Poland modelling on NBP BaRN data. Narodowy Bank Polski International Workshop, Zalesie Górne, November 2013

Re-sales Analyses - Lansink and MPAC

ECONOMIC AND MONETARY DEVELOPMENTS

Rightmove House Price Index

Residential September 2010

The Improved Net Rate Analysis

OECD-IMF WORKSHOP. Real Estate Price Indexes Paris, 6-7 November 2006

Developing a Residential Property Price Index (RPPI) for Canada: Approach, Risks and Challenges

International Comparison Program [01.06] Owner Occupied Housing Notes on the Treatment of Housing in the National Accounts and the ICP Global Office

An Assessment of Recent Increases of House Prices in Austria through the Lens of Fundamentals

The Impact of Employment on House Prices:

Is there a conspicuous consumption effect in Bucharest housing market?

A. K. Alexandridis University of Kent. D. Karlis Athens University of Economics and Business. D. Papastamos Eurobank Property Services S.A.

Demonstration Properties for the TAUREAN Residential Valuation System

Can the coinsurance effect explain the diversification discount?

Residential Property Index Series. January 2018

Residential Property Index Series. August 2017

Rightmove House Price Index

Course Residential Modeling Concepts

Messung der Preise Schwerin, 16 June 2015 Page 1

STATISTICAL REFLECTIONS

RESIDENTIAL PROPERTY PRICE INDEX (RPPI)

A statistical system for. Residential Property Price Indices. David Fenwick

Description of IHS Hedonic Data Set and Model Developed for PUMA Area Price Index

RESIDENTIAL PROPERTY PRICE INDEX (RPPI)

Housing Bulletin Monthly Report

RESIDENTIAL PROPERTY PRICE INDEX (RPPI)

Meeting of Group of Experts on CPI 30 May 1 June 2012

MARKET STRATEGY VIEWPOINT U.S. Housing Decelerating

Performance of the Private Rental Market in Northern Ireland

THE EFFECT OF PROXIMITY TO PUBLIC TRANSIT ON PROPERTY VALUES

Hedonic Pricing Model Open Space and Residential Property Values

Report on the methodology of house price indices

D DAVID PUBLISHING. Mass Valuation and the Implementation Necessity of GIS (Geographic Information System) in Albania

The Housing Price Bubble, Monetary Policy, and the Foreclosure Crisis in the U.S.

86 years in the making Caspar G Haas 1922 Sales Prices as a Basis for Estimating Farmland Value

Residential January 2010

Monthly Statistics Package June 2015

Volume Author/Editor: W. Erwin Diewert, John S. Greenlees and Charles R. Hulten, editors

3rd Meeting of the Housing Task Force

Residential December 2010

RESEARCH BRIEF TURKISH HOUSING MARKET: PRICE BUBBLE SEPTEMBER 2014 SUMMARY. A Cushman & Wakefield Research Publication OVERVIEW

Methodology of JRPPI: Japan Residential Property Price Index

Measuring the Services of Durables and Owner Occupied Housing

Sponsored by a Grant TÁMOP /2/A/KMR Course Material Developed by Department of Economics, Faculty of Social Sciences, Eötvös Loránd

Quantifying the relative importance of crime rate on Housing prices

METROPOLITAN COUNCIL S FORECASTS METHODOLOGY

Residential December 2009

The measurement of euro area property prices pitfalls and progress. - Andrew Kanutin, Martin Eiglsperger 1, ECB 23

The purpose of the appraisal was to determine the value of this six that is located in the Town of St. Mary s.

UNITED KINGDOM OCCUPANCY SURVEY. Serviced Accommodation Summary Report March the research solution

RESIDENTIAL PROPERTY PRICE INDEX (RPPI)

DUNA HOUSE BAROMETER. July month issue THE LATEST PROPERTY MARKET INFO FROM DUNA HOUSE NETWORK

1 February FNB House Price Index - Real and Nominal Growth

Monthly Indicators % % - 9.2%

DEMAND FR HOUSING IN PROVINCE OF SINDH (PAKISTAN)

Commercial Property Price Indexes and the System of National Accounts

REIDIN.com The United Arab Emirates Residential Property Price Indices: April 2011 Results. Issue: 16 15/05/2011

Relationship of age and market value of office buildings in Tirana City

Rambutan Road Report by Justin, HP: Property Summary Sheet Latest Avg PSF: $897 psf (Apr 10)

Housing Price Prediction Using Search Engine Query Data. Qian Dong Research Institute of Statistical Sciences of NBS Oct. 29, 2014

Estimating National Levels of Home Improvement and Repair Spending by Rental Property Owners

Asking Price Index Released 12/02/16 February 2016

The Corner House and Relative Property Values

duna house August issue The latest property market info from Duna House network

Residential March 2010

UK Occupancy Survey. for serviced accommodation. summary report 2004

Price Indices: What is Their Value?

UDIA WA PROPERTY MARKET STATISTICS

Estimating User Accessibility Benefits with a Housing Sales Hedonic Model

How Did Foreclosures Affect Property Values in Georgia School Districts?

Financial Modeling Workshop Using Excel

California Housing Market Update. Monthly Sales and Price Statistics October 2018

Housing Price Forecasts. Illinois and Chicago PMSA, October 2014

ParcelMap BC Compiling a Parcel Fabric for the Province of British Columbia. Presented by: Ellen Styner (General Manager) and Wendy Amy (GIS Manager)

Transcription:

An Alternative Hedonic Residential Property Price Index for Indonesia Using Big Data: The Case of Jakarta* Arief Noor Rachman 1 Abstract Monitoring property prices dynamics is a necessary task for central banks in order to maintain financial stability in the economy. In accordance with recent rapid growth of digital technology, big data offers potential to be a new source of data that can be used to produce official statistics in property fields. In this paper, we utilized big data to develop an alternative residential property prices index (RPPI) for secondary market (existing house) in order to challenge Bank Indonesia s existing survey based RPPI. The dataset is based on residential property advertisement listings from Indonesia s major property web portals from January 2016 to September 2018. As a prototype index, this study initially focuses on the Jakarta capital city and stratifies the samples into five districts. We employed hedonic methods as a quality-mix adjustment to calculate robust asking prices indexes given the availability of property characteristics data. Our regression outputs generally have promising results with all characteristics variables --lot size, building size, number of bedroom and number of bathroom-- were statistically significant, stable, and had high explanatory power. The composite index for Jakarta is compiled using mortgage data as weighting. The new composite index series is below the existing appraisal based RPPI level during mid-2017 to mid-2018. However, it seems to move in the same direction and converges at the end of sample horizon. Future development will extend the index coverage to other large cities. However, data quality concern still become the main challenge of using big data from the internet as a source of official statistics. Keywords: Residential property prices index, big data, time dummy hedonic method. 1. Introduction Most central banks have the main task of stabilizing and keeping the momentum of economic growth through monetary policy. Asset price channel is one of the monetary policy transmission mechanism which explains how monetary policy can affect general wealth in the economy. As an asset price inflation measurement, property price becomes an important leading indicator of economy s dynamic since investment in property sector is a long-term type of investment. Property statistics could provide an early sign of economic cycle movement. Rising of property prices often leads to an expansionary phase period (boom) whereas falling of property prices indicate a contractionary phase (bust) (Eurostat, 2013). * This paper was presented to the International Conference on Real Estate Statistics, February 20-22, 2019 Eurostat, Luxembourg. I thank to Niall O Hanlon (IMF) for valuable technical assistance, Annisa Cynthia and Arinda Dwi Okfantia (BI) for helpful inputs. All views expressed are those of author, do not necessarily represents view of Bank Indonesia. 1 Analyst at Statistics Department, Bank Indonesia email: arief_nr@bi.go.id

Recent financial crises are widely linked with the dynamic of the property market. The US housing market crash was the main cause of the financial crisis in 2007. As financial system stability or macro-prudential policy is also part of central bank s responsibility, monitoring residential property price becomes crucial. Since 1999, Bank Indonesia (BI) has established residential property prices survey to the major property developers in 16 major cities. BI has also started to publish a residential property prices index (RPPI) for the primary house market (newly-built houses) calculated using chain index method. In order to capture property price dynamic in the secondary market (existing house), BI has been conducting residential property prices survey for the secondary market in ten major cities using the appraisal method on a quarterly basis since 2011. According to Eurostat (2013), calculation of RPPI is a tricky process because houses are infrequently sold, and heterogeneous in terms of their structural characteristics such as location, size, and facilities. This may lead to a quality problem of price measurement since the characteristics differences across houses are hard to control, especially with the limited frequency of transaction data available. For identification of quality changes factors, quality-mix adjustment is needed to avoid misleading interpretation of indexes result. Silver (2016) identified several methods for quality-mix adjustment such as hedonic methods, repeat sales, sales price appraisal ratio (SPAR), etc. However, the hedonic method is believed to be a more preferable method than repeat sales due to its ability to utilize relevant property characteristics data using regression technique. Furthermore, hedonic regression of house prices on its characteristics decomposes the overall price and provides estimated marginal value for each of its characteristics. Recently, digital technology has grown very fast and has created an extra-large amount of digital data known as Big Data. All these data are well-stored and can potentially be utilized. Big data offers several benefits to official institutions, such as producing new indicators, bridging time lags availability of existing official statistics and providing an advanced source of data to produce official statistics. However, there are some challenges in dealing with big data such as (1) data quality concerns 2 ; (2) legal access to the data sources considering big data is typically owned by private entities; and (3) advanced skill and technology requirements (Daas et al., 2014 and Hammer et al., 2017). Those advantages and disadvantages of big data prompted statistical institution to be more careful when implementing big data as new source of official statistics. As mentioned by Hammer et al. (2017) Big Data offers opportunities, challenges, and implications for official statistics that compilers and users of statistics need to be aware of when they start to incorporate big data into their work plan to the extent relevant. In the property price statistics context, the utilization of big data could make an improvement in RPPI compilation to challenge the existing official RPPI. Property advertisement listing data on the websites offers an immediate, inexpensive, and huge amount of alternative data source available for RPPI. According to Eurostat (2013), most data from advertising agencies or websites are seller s asking prices. Timeliness becomes the main benefit of asking prices based indexes. Nevertheless, it is also has major weakness such as price differences with the actual transaction prices that may lead to misleading average house estimates. However, RPPI based on asking prices is still a feasible solution for monitoring purposes, especially in the absence of declared transaction data such as administrative data from land registry or property tax records. The decentralization of property tax administration to the local governments makes 2 we have to deal with large amount of data and cleansing the data by reducing the volume of the data without losing too much information. 2

it more difficult to collect the property transaction data due to broader coverage and unstandardized property tax records. In this paper, we developed an alternative RPPI using big data from property advertisement web portals as a prototype. We employed hedonic method to calculate robust asking price indexes for secondary market property (used houses) in five districts in Jakarta following adopting window time-dummy approach. The composite indexes for Jakarta are then compiled using individual mortgage collateral values in each districts as weighting component. This paper is organized as follow. The first section discusses the background of the study. Section two explains the data and the methodology used. Discussion of the result is presented in section three. Lastly, conclusion and further work are presented in section four. 2. Data and Methodology 2.1. Data Sources In the development of alternative RPPI using Big Data, we collect monthly data from two largest property advertisement web portals in Indonesia with more than fifty percent total market share. Bank Indonesia secured the data acquisition through a non-disclosure agreements (NDA) with those two web portals. Big data preparation and extraction from web portals server are processed using virtual machines and Hadoop Software with approximately 2.2 million ads every month. For RPPI compilation, we only included first instance of listings for each property at a unique price in each month. These data are individual listings with details on asking price, property type, lot size, dwelling size, number of bedrooms, number of bathrooms, address, and additional characteristics which are recorded as free-text (such as garage, gated property, swimming pool). For simplicity purposes, we left those free-text characteristics aside in this study. 2.2. Data Treatment Data cleansing is very crucial step in dealing with big data due to concerns over data quality as mentioned previously in the big data challenges. Since we obtain data from individual listing in web portals there are several issues regarding the data, such as (1) human error in data entry; (2) unstandardized address data due to free-text data field; and (3) several duplicate advertisements which are mainly caused by fact that one property can be advertised by more than one seller in a single portal as well as across portals, and advertisements re-post after expiration date. For data preparation, we removed duplicated and corrupted data in the system. The next data preparation steps are statistical edits based on the assumption of normally distributed data. We removed the spurious values of price data using a median absolute deviation (MAD) test on price per unit of property size. The same method was also applied for building size. We remove lot size observations which are greater than 600 square-meter, and took out observations with number of bedroom that greater than 10 and the number of bathroom that greater than 8 based on normal distribution assumption. Finally, we ran the preliminary regression to identify outliers using Cook s Distance method. This method identifies outliers based on the combination of each observation s leverage and residual values, with the formula as follow: 3

D i = n j=1 (Y j Y j(i) ) 2 (p + 1)σ 2 Where D i is Cook s Distance value of observation i, Y j is fitted value of regression, Y j(i) is fitted value obtained when observation i removed from regression, σ 2 is variance error of regression, and p is number of predictors. We remove the observations with value of D more than 4/number of observations as suggested by a conventional standard (O Hanlon, 2011). Table 1 presents the data records for each district. On average we removed around 30 percent of the total data each year. 2.3. Methodology Table 1. Data Records 2016 2017 2018 (9 months) City After Cook's % from Raw After Cook's % from Raw After Cook's % from Raw Raw Data After Edits* Raw Data After Edits* Raw Data After Edits* Distance Data Distance Data Distance Data West Jakarta 40141 32868 31184 77.7 38477 31264 29831 77.5 23743 20114 19153 80.7 Central Jakarta 7357 4946 4662 63.4 7650 5046 4762 62.2 4673 3516 3331 71.3 South Jakarta 59922 41541 39626 66.1 66294 43575 41647 62.8 37877 29619 28231 74.5 East Jakarta 30625 23421 22180 72.4 36237 28921 27426 75.7 30272 20645 19509 64.4 North Jakarta 20370 15474 14640 71.9 18498 14795 14026 75.8 14163 10998 10370 73.2 * Removed: Repeating ads each month, duplication ads, No. of Bedroom = 0 or more than 10, No. of Bathroom = 0 or more than 8, and lot size more than 600 sqm The hedonic methods are known to be able to overcome the mix-quality problem. Hedonic methods estimate property prices based on its characteristics and utilize all the information available. There are three main approaches in the hedonic methods: 1) time dummy variables, 2) characteristics prices, and 3) hedonic price imputation. We employed time dummy variables approach because of its simplicity, since the price index can immediately be derived from the estimated time dummy regression coefficients. We used the semi-log regression model of pooled time dummy variables since house prices variables were not normally distributed in levels (positively skewed distribution). The basic semi-log hedonic model is represented as follow: ln p n t = β 0 t + δ τ D n τ T τ=1 K + β t t k z nk t t Where, p n is the price of property n at time t, z nk is k characteristics variable of n property at time t, β0 and βk are intercept and house characteristics parameters, δτ are dummy coefficients. Our hedonic model has building size, lot size, number of bedrooms and number of bathrooms as characteristics variables. The number of bedrooms and the number of bathrooms are treated as dummy variables. We have three dummy variables for the number of bedrooms - one and two bedrooms, three bedrooms, and greater than four bedrooms. Four bedrooms is used as a reference based on the highest frequency number of bedrooms. We also have three dummies for the number of bathrooms with three bathrooms as reference. For RPPI compilation, we follow the methodology of Japan Residential Property Price Index (JRPPI) calculation 3, using rolling window technique to compute RPPI from time dummy hedonic regression. k=1 + ε n t Estimated time dummy coefficients (δ τ ) are arranged in tables as follows: 3 Land Economy and Construction Industries Bureau (2016), Methodology of JRPPI: Japan Residential Property Price Index Ministry of Land, Infrastructure, Transport and Tourism. 4

Table 2. Rolling windows time dummy coefficients compilation r t 1 2 3 τ τ + 1 T τ + 1 T 1 1 δ 1 2 δ 1 3 δ 1 τ δ 1 2 δ 2 2 δ 2 3 δ 2 τ δ 1 τ+1 3 δ 3 3 δ 3 τ δ 3 τ+1 T τ + 1 T τ+1 δ T τ+1 T δ T τ+1 The index can be obtained by: p τ+1 p 1 = exp(δ 1τ ) exp(δ 2 τ+1 ) exp(δ 2τ ) Suppose the base period is the first period, then the price difference between period 1 with period τ + 1 can be calculated from time dummy coefficient of the first window time range in the last period (time τ) and the last period and second to last period of time dummy coefficients in the second window time range (time τ and time τ + 1). Next, all indexes can be obtained by applying calculations for all window time ranges. We stratified the residential property index by region. There are five districts which are parts of Greater Jakarta, i.e. West Jakarta, Central Jakarta, South Jakarta, East Jakarta and North Jakarta. We estimated house prices on property characteristics in each region, then compiled the indexes into Total Jakarta index using the individual mortgage values data as weighting. Mortgage data is used as a proxy of property transaction value in the absence of a more representative measure of property market structure such as, tax revenues for property transfer transaction. 3. Result 3.1. Regression Result Our data indicates that house price data in levels is not normally distributed. Hence, we have to transform house price variable into a logarithm form before running the semi-log hedonic regression models. We ran a 12-month rolling windows regression from January 2016 to September 2018 for five different regions with total of 110 regressions. We ran two stages of regression to the models, the first regression was to identify outliers using Cook s Distance and the second regression without outliers was to produce index. We present a sample result of 12-month rolling window regression from January 2016 to December 2016 in North Jakarta and South Jakarta in Table 3 below. This results showed a relatively high explanatory power as indicated by the value of adjusted R-square. Given the limited variables of house characteristics available, such high explanatory power probably implies a relatively homogenous house market in the districts. 5

Table 3. Hedonic Regression Result of North Jakarta and South Jakarta Dependent variable: Ln Price North Jakarta Independent variables Estimates Robust SE Estimates Robust SE Intercept 21.2900 0.00851 *** 20.8540 0.00980 *** Building size 0.0010 0.00002 *** 0.0023 0.00002 *** Lot size 0.0045 0.00003 *** 0.0031 0.00002 *** Dum_# of bedroom 1-2 -0.2153 0.00824 *** -0.1042 0.00507 *** Dum_# of bedroom 3-0.0440 0.00456 *** -0.4461 0.00983 *** Dum_# of bedroom >4-0.0400 0.00536 *** -0.0929 0.00545 *** Dum_# of bathroom 1-0.2225 0.00879 *** -0.2965 0.01033 *** Dum_# of bathroom 2-0.1732 0.00478 *** -0.1389 0.00564 *** Dum_# of bathroom >3 0.0082 0.00492 * -0.0095 0.00502 * Dum_period 2016:2 0.0007 0.00867-0.0132 0.01039 Dum_period 2016:3 0.0006 0.00888 0.0164 0.01023 Dum_period 2016:4 0.0034 0.00913-0.0175 0.01053 * Dum_period 2016:5-0.0203 0.00765 *** 0.0003 0.00921 Dum_period 2016:6-0.0123 0.00971 0.0390 0.01094 *** Dum_period 2016:7-0.0132 0.00936 0.0015 0.01085 Dum_period 2016:8 0.0022 0.00954 0.0148 0.01070 Dum_period 2016:9-0.0143 0.01008 0.0307 0.01086 *** Dum_period 2016:10-0.0169 0.00829 ** 0.0268 0.00974 *** Dum_period 2016:11-0.0307 0.00916 *** -0.0158 0.00991 Dum_period 2016:12-0.0435 0.01056 *** -0.0030 0.01052 Adjusted R-squared 0.863 0.783 F-statistics 4866 7507 Number of observations 14,640 39,626 Notes: *** significance at 1%, ** significance at 5% and * significance at 10%. South Jakarta All characteristics variables in the model generally were statistically significant, stable, and in line with a priori expectation over time 4. On average, building size and lot size have positive impact on house prices, while number of bedrooms and number of bathrooms have a mix result. It seems that given other characteristics are the same, additional number of bedrooms greater than four may be less preferable because it will reduce the living space of the house. The same argument also apply to the number of bathrooms. This result also implies that a house with an additional 10 square meters of lot size is 4.5 percent more expensive than average if other variables are kept constant. Regression result for South Jakarta districts with larger number of observations shows similar finding. The explanatory power slightly drops but still relatively high at 0.78 compared to the previous result. All house characteristics are significant, building size has twice the impact compared to the previous result while lot size impact is slightly less. 4 We compared this regression result with one year ahead regression window (January 2017 to December 2017) and had a relatively consistent sign and coefficients for each explanatory variables. 6

Breusch-Pagan test was applied to detect heteroscedasticity. The result showed that heteroscedasticity was present in our model. Since heteroscedasticity only affect standard errors and keeps the coefficients remained unbiased, we calculated robust standard errors to improve the t-statistics value. 3.2. Indexes Result Adopting the rolling window method, we calculated RPPI for each five of the districts from the estimated time dummy regression coefficients. The monthly indexes suffered from short term volatility, thus we employed a 3-months moving average to smooth out the series 5. We compare the new index with the existing RPPI appraisal based. The quarterly existing index is expanded to a monthly index by just simply putting the same index in each month within quarter. Figure 1. Comparison of new index for secondary market houses in North Jakarta, 3-month moving average of new index, and existing appraisals-based index. 115 North Jakarta RPPI (Jan 2016=100) 110 105 100 95 90 hedonic RPPI 3-month MA Exst 2ndary RPPI-Appraisal Bsd Figure 1 shows the index series for North Jakarta. In the case of North Jakarta, the hedonic index result generally slowed down from 2016 to early 2017 before it started to bounce back to the initial level in the early 2018. The existing RPPI shows increasing trend during the sample horizon, with faster acceleration in the second half of 2017 to early 2018. Slightly different with the case of North Jakarta, the hedonic index for South Jakarta shows a rapidly increasing trend from the last quarter of 2017 to third quarter of 2018 exceeding the relatively constant increasing trend of existing RPPI (Figure 2). 5 Central Statistics Office (CSO) of Ireland published National House Price Index using same technique to smooth out the short-term volatility See O Hanlon (2011). 7

Figure 2. Comparison of new index for secondary market houses in South Jakarta, 3-month moving average of new index, and existing appraisals-based index. 115 South Jakarta RPPI (Jan 2016=100) 110 105 100 95 90 Hedonic RPPI 3-months MA Exst 2ndary RPPI - Appraisal Bsd Figure 3. Comparison of new index for secondary market houses in Central Jakarta, 3-month moving average of new index, and existing appraisals-based index. 115 Central Jakarta RPPI (Jan 2016=100) 110 105 100 95 90 Hedonic RPPI 3-months MA Exst 2ndary RPPI - Appraisal Bsd The index series for Central Jakarta districts shows a different pattern (Figure 3). The series highly suffers from volatility and even the smoothed series cannot avoid high volatility. It is probably affected by the small number of observations available for estimation. The number of observations in Central Jakarta is only around 4% of the total number of observations of all Jakarta. 8

Figure 4 presents the composite index for all Jakarta. The new index indicates a promising result and an improvement in terms of volatility. It shows a smooth and increasing trend during the sample horizon and tends to converge with the existing index at the end. Both indexes tend to move in the same direction with the new index levels being lower than the existing from mid-2017 to mid-2018. The indexes have the same slow acceleration from 2016 to the third quarter of 2017. This may have been affected by the lower national economic growth from 2016 to mid-2017. Figure 4. Comparison of new index for secondary market houses in all-jakarta, 3-month moving average of new index, existing appraisals-based index, and existing index for primary market. 115 Comparison of Aggreate Indexes for Jakarta (Jan 2016=100) 110 105 100 95 90 Hedonic RPPI Jakarta Exst 2ndary RPPI - Appraisal Bsd 3-months MA Exst Primary RPPI - Devlpr survey Bsd 4. Conclusion By employing time dummy hedonic method we have computed alternative residential property price index for the secondary market in five districts of Jakarta based on web listing property advertisement. Our Hedonic indexes compilation showed a promising result and have the potential to become official RPPI in the future. The regression outputs represent robust baseline models for index compilation. Our web listing advertisement observations seem more homogenous in nature, --as indicated by high explanatory power given limited characteristics variables available. Smoothing may give a better option for the published index in order to reduce short-term volatility. For further development, we keep these baseline models to extend the coverage to other large cities in Indonesia. This extension will depend on the suitability of the listing data and the relative importance of cities according to the national property market share derived from mortgage data. We need to keep the new index remains representative of the current market condition by regularly reviewing the model performance and updating the weights. We may need to enhance the models by including a more granular spatial adjustment and other characteristics (such as age of property) in the future. 9

References Daas, Piet J.H., M. Puts, M. Tennekes, and A. Priem (2014) Big Data as a Data Source for Official Statistics: Experiences at Statistics Netherlands, Statistics Canada Symposium 2014, Ottawa. Diewert, Erwin (2003) Hedonic Regression. A Consumer Theory Approach NBER Volume: Scanner Data and Prices Indexes, University of Chicago Press, Chicago. Eurostat (2013), Handbook of on Residential Property Prices (RPPIs), Luxembourg: Publication Office of the European Union. Hammer, Cornelia L., D. C. Kostroch, G. Quirós, and STA Internal Group (2017) Big Data: Potential, Challenges, and Statistical Implications. IMF Staff Discussion Note, SDN/17/06, IMF, Washington DC. Hill, Robert J. (2011), Hedonic Price Indexes for Housing, OECD Statistics Working Paper No. 36, OECD, Paris. Hill, Robert J. (2013) Hedonic Price Indexes for Residential Housing: A Survey, Evaluation and Taxonomy, Journal of Economic Surveys, Vol. 27(5), pp879-914, John Wiley & Sons Ltd. Hülagu, Timur, E. Kizilkaya, A. G. Özbekler, and P. Tunar (2015), A Hedonic House Price Index for Turkey, Turkish Statistical Institute and European Real Estate Society 22nd Annual Conference, Istanbul. Land Economy and Construction Industries Bureau (2016), Methodology of JRPPI: Japan Residential Property Price Index, Ministry of Land, Infrastructure, Transport and Tourism, Tokyo. Li, Wenzheng, M. Prud homme, and K. Yu (2006), Studies in Hedonic Resale Housing Price Indexes. Canadian Economic Association 40th Annual Meetings, Concordia University, Montréal. Marsden, Joel (2015) House Prices in London an Economic Analysis of London s Housing Market GLA Economics Working Paper 72, Greater London Authority, London. O Hanlon, Niall (2011), Constructing a National House Price Index for Ireland, Journal of the Statistical and Social Inquiry Society of Ireland Vol. XL pp167-196, Dublin. Radermacher, Walter J. (2018) Official Statistics in the Era of Big Data Opportunities and Threats, International Journal of Data Science and Analytics, Vol. 6 (3), pp225-231, Springer International Publishing. Silver, Mick (2016), How to Better Measure Hedonic Residential Property Price Indexes, IMF Working Paper, WP/16/213, IMF, Washington DC. Zwick, Markus (2017) Introduction to Big Data in Official Statistics, Presentation, Institute for Research and Development in Official Statistics, Federal Statistics Office Germany, https://www.ec.europa.eu/eurostat/cros/system/files/2017_04_19_big_data1_zwick.pdf 10

Appendix 1. Data distribution Sample (North Jakarta) Figure 5. Variables Histogram 11

2. Data Preparation (Conducted by Big Data Unit) Figure 6. Big Data for RPPI Workflow Virtual Machine* 1 Pre-Processing Remove HTML Tag Column Mapping 2 Remove Duplicate Hadoop 3 Processing Data edits Statistical Trimming 4 Regression & Analysis Run Hedonic method Produce Index Analysis Data Source - Property online ads from 2 biggest property websites with approximately 9000 ads per month (Jakarta) - Data available since 2015, listings only include the first instances of sales listings for each property at a unique price. - Data attributes: Title Status of property : sell/rent Type of property (house/apartment/villa/condotel/ condominium) Advertising time : Starting & end date Property price Land & building size Number of bedroom & bathroom Address Property Description 12

Pre-Processing - Cleansing Cleansing process on the data obtained includes the deletion of irrelevant characters such as HTML tag on advertisement title and description. The deletion of HTML tag is done through Python programming language, utilizing one of its library, HTMLParser. - City Mapping City mapping is conducted to generalize the addresses shown in the data to the city level. This process is carried out because some portals do not provide city/district s data on each advertisement. City mapping utilizes the list of city/district and sub-district obtained from Indonesian Statistics (BPS). If the address in the data cannot be found in the list of sub-district, city mapping is then done using Google Maps Geocoding API. - Column Mapping Data from different portals have distinct format and column structure. For example, data from Rumah123 consists of 21 attribute columns with the delimiter (bar) or ~ (tilde) between the columns. In contrast, data from Urbanindo consists of 13 attribute columns with tab as the delimiter. Thus, column mapping is needed to standardize column structure, column name, and its delimiter across portals. With this, data can be compiled and processed simultaneously. - Duplicates Removal There are still many duplications in the data which is about to be processed. This includes intraportal and inter-portal duplications. Intra-portal duplications exist if the same property is advertised by different agents or the same property advertisement is reposted by agent. Interportal duplications exist if the same property is advertised on different portals. Property advertisements are considered the same if their price, land area, building area, number of bathrooms, number of bedrooms and city have the same value. - Odd data Removal In addition to the problem of duplication of advertisements, there are issues regarding odd data. Below are some criteria, which have been successfully identified, for a data to be considered as an Odd data: a. Missing values - the advertisement does not provide data regarding the its area and its building area. b. Unusual price - For instance, a land which costs Rp 0 or a land of 100 m 2 which costs 23T. c. Unusual building or land area For instance, a land area of 0 m 2 or a building area of 10.000 m 2. 13

d. Unusual price due to location - For example, there is a house which is sold at Rp 50 million in Jakarta Pusat. e. Unusual ratio of land area to building area For example, the data shows a house with a building area of 300 m2 and a land area of 30 m2 or a house with a building area of 30 m2 and a land area of 1.000 m2. 3. Regression Output Table 4. Regression Output Stability Overtime Check (North Jakarta Sample) Dependent variable: Ln Price Independent variables 2016:1-2016:12 Cofficients 2017:1-2017:12 Cofficients Intercept 21.2900 *** 21.2900 *** Building size 0.0010 *** 0.0011 *** Lot size 0.0045 *** 0.0041 *** Dum_# of bedroom 1-2 -0.2153 *** -0.0310 *** Dum_# of bedroom 3-0.0440 *** -0.2185 *** Dum_# of bedroom >4-0.0400 *** -0.0326 *** Dum_# of bathroom 1-0.2225 *** -0.1997 *** Dum_# of bathroom 2-0.1732 *** -0.1853 *** Dum_# of bathroom >3 0.0082 * 0.0038 Dum_period 2016 (2017):2 0.0007 0.0147 Dum_period 2016 (2017):3 0.0006 0.0025 Dum_period 2016 (2017):4 0.0034-0.0077 Dum_period 2016 (2017):5-0.0203 *** -0.0012 Dum_period 2016 (2017):6-0.0123-0.0049 Dum_period 2016 (2017):7-0.0132 0.0167 * Dum_period 2016 (2017):8 0.0022 0.0244 *** Dum_period 2016 (2017):9-0.0143-0.0061 Dum_period 2016 (2017):10-0.0169 ** 0.0009 Dum_period 2016 (2017):11-0.0307 *** -0.0023 Dum_period 2016 (2017):12-0.0435 *** -0.0195 * Adjusted R-squared 0.863 0.862 F-statistics 4866 4067 Number of observations 14,640 14,026 Notes: *** significance at 1%, ** significance at 5% and * significance at 10%. Table 5. Testing for Heteroscedasticity Breusch-Pagan Test for Heterosedasticity North Jakarta BP = 204.1, df = 19, p-value < 2.2e-16 South Jakarta BP = 366.96, df = 19, p-value < 2.2e-16 Central Jakarta BP = 109.83, df = 19, p-value = 8.574e-15 West Jakarta BP = 412.43, df = 19, p-value < 2.2e-16 East Jakarta BP = 135.36, df = 19, p-value < 2.2e-16 14

Table 6. Regression Output for Three Other Districts Dependent variable: Ln Price Notes: *** significance at 1%, ** significance at 5% and * significance at 10%. Central Jakarta West Jakarta East Jakarta Independent variables Estimates Robust SE Estimates Robust SE Estimates Robust SE Intercept 21.0800 0.02999 *** 20.9310 0.00682 *** 20.5700 0.00906 *** Building size 0.0014 0.00006 *** 0.0012 0.00002 *** 0.0013 0.00003 *** Lot size 0.0038 0.00007 *** 0.0044 0.00003 *** 0.0033 0.00003 *** Dum_# of bedroom 1-2 -0.0604 0.01616 *** -0.0382 0.00351 *** -0.0666 0.00482 *** Dum_# of bedroom 3-0.1352 0.02586 *** -0.2171 0.00525 *** -0.2564 0.00773 *** Dum_# of bedroom >4-0.1415 0.01474 *** -0.0375 0.00483 *** -0.0250 0.00590 *** Dum_# of bathroom 1-0.4678 0.02789 *** -0.0968 0.00597 *** -0.3393 0.00799 *** Dum_# of bathroom 2-0.1743 0.01564 *** -0.0974 0.00336 *** -0.1387 0.00456 *** Dum_# of bathroom >3 0.0084 0.01505 0.0194 0.00436 *** 0.0293 0.00534 *** Dum_period 2016:2 0.0185 0.03000-0.0096 0.00676 0.0001 0.00895 Dum_period 2016:3 0.0752 0.02998 ** -0.0191 0.00667 *** 0.0414 0.00890 *** Dum_period 2016:4 0.0275 0.02962-0.0253 0.00668 *** 0.0064 0.00898 Dum_period 2016:5-0.0306 0.02658-0.0295 0.00578 *** 0.0326 0.00779 *** Dum_period 2016:6 0.0164 0.03036-0.0202 0.00706 *** 0.0383 0.00947 *** Dum_period 2016:7 0.0816 0.03128 *** -0.0103 0.00703 0.0202 0.00944 ** Dum_period 2016:8 0.0966 0.03310 *** -0.0146 0.00693 ** 0.0590 0.00910 *** Dum_period 2016:9 0.0503 0.03384-0.0405 0.00741 *** 0.0705 0.00950 *** Dum_period 2016:10 0.0577 0.02918 ** -0.0164 0.00656 ** 0.0396 0.00839 *** Dum_period 2016:11 0.0066 0.03123-0.0259 0.00687 *** 0.0320 0.00890 *** Dum_period 2016:12 0.0766 0.03284 ** -0.0098 0.00742 0.0606 0.00942 *** Adjusted R-squared 0.731 0.813 0.835 F-statistics 667 7138 5891 Number of observations 4,662 31,184 22,180 15

4. Hedonic RPPI Result Figure 7. Comparison of new index for secondary market houses in West Jakarta, 3-month moving average of new index, and existing appraisals-based index. 115 West Jakarta RPPI (Jan 2016=100) 110 105 100 95 90 Hedonic RPPI 3-months MA Exst 2ndary RPPI - Appraisal Bsd Figure 8. Comparison of new index for secondary market houses in East Jakarta, 3-month moving average of new index, and existing appraisals-based index. 125 120 115 110 105 100 95 90 East Jakarta RPPI (Jan 2016=100) Hedonic RPPI 3-months MA Exst 2ndary RPPI - Appraisal Bsd 16

Table 7. Hedonic RPPI Indexes (Smoothed 3-month moving average) Period West Jakarta Central Jakarta South Jakarta East Jakarta North Jakarta Jakarta Jan-16 100.00 100.00 100.00 100.00 100.00 100.00 Feb-16 99.05 101.87 98.69 100.01 100.07 99.59 Mar-16 99.05 103.23 100.11 101.41 100.04 100.30 Apr-16 98.22 104.16 99.53 101.63 100.16 100.06 May-16 97.57 102.53 99.98 102.73 99.46 99.76 Jun-16 97.53 100.48 100.76 102.62 99.04 99.64 Jul-16 98.02 102.38 101.39 103.09 98.49 100.10 Aug-16 98.51 106.77 101.87 104.00 99.23 101.15 Sep-16 97.85 107.93 101.59 105.14 99.16 101.10 Oct-16 97.65 107.08 102.44 105.81 99.04 101.24 Nov-16 97.28 103.92 101.42 104.87 97.96 100.12 Dec-16 98.28 104.85 100.28 104.51 97.02 99.86 Jan-17 97.98 106.53 99.44 105.90 96.44 99.68 Feb-17 98.35 110.81 101.09 107.25 96.73 100.99 Mar-17 99.38 110.48 100.51 107.90 96.98 101.13 Apr-17 100.44 108.46 99.64 107.23 96.66 100.74 May-17 101.75 108.29 98.24 108.40 96.83 100.73 Jun-17 100.99 108.82 98.10 110.27 96.99 100.74 Jul-17 100.73 106.74 98.94 111.26 98.28 101.07 Aug-17 99.47 101.42 99.00 110.75 98.24 100.08 Sep-17 100.73 102.27 98.50 109.43 98.27 100.25 Oct-17 101.27 104.38 98.58 110.05 97.91 100.62 Nov-17 101.88 108.94 100.41 110.09 98.23 101.99 Dec-17 99.79 104.86 102.49 110.24 98.94 101.82 Jan-18 100.06 100.66 104.01 111.83 99.99 102.23 Feb-18 100.58 95.81 104.17 113.17 99.97 101.92 Mar-18 101.08 94.99 105.87 116.46 99.86 102.70 Apr-18 99.52 96.37 107.81 117.16 100.00 103.18 May-18 98.99 97.42 108.71 117.55 100.47 103.60 Jun-18 100.41 98.46 109.67 118.06 100.87 104.52 Jul-18 101.27 101.26 110.33 117.41 99.41 104.89 Aug-18 102.29 102.76 110.64 118.07 98.99 105.37 Sep-18 101.42 101.14 111.98 115.89 99.01 105.24 17