PropTech for Proactive Pricing of Houses in Classified Advertisements in the Indian Real Estate Market Sayan Putatunda

Similar documents
Housing Price Prediction Using Search Engine Query Data. Qian Dong Research Institute of Statistical Sciences of NBS Oct. 29, 2014

Land Registry. Issues in current system

86M 4.2% Executive Summary. Valuation Whitepaper. The purposes of this paper are threefold: At a Glance. Median absolute prediction error (MdAPE)

A. K. Alexandridis University of Kent. D. Karlis Athens University of Economics and Business. D. Papastamos Eurobank Property Services S.A.

What s Next for Commercial Real Estate Leveraging Technology and Local Analytics to Grow Your Commercial Real Estate Business

Coachella Valley Median Detached Home Price Mar Mar 2018

86 years in the making Caspar G Haas 1922 Sales Prices as a Basis for Estimating Farmland Value

REDSTONE. Regression Fundamentals.

Hennepin County Economic Analysis Executive Summary

Information Quality - A Critical Success Factor How to make it all right!

*Predicted median absolute deviation of a CASA value estimate from the sale price

MULTIPLE CHALLENGES REAL ESTATE APPRAISAL INDUSTRY FACES QUALITY CONTROL. Issues. Solution. By, James Molloy MAI, FRICS, CRE

EXPLANATION OF MARKET MODELING IN THE CURRENT KANSAS CAMA SYSTEM

Relationship of age and market value of office buildings in Tirana City

On the Relationship between Track Geometry Defects and Development of Internal Rail Defects

Cadastre and the crowd: a brief encounter or a long-lasting marriage?

Is there a conspicuous consumption effect in Bucharest housing market?

Analyzing Ventilation Effects of Different Apartment Styles by CFD

Recent Trends in Legal Tech 21 March 2018

Journal of Babylon University/Engineering Sciences/ No.(5)/ Vol.(25): 2017

SANDAKAN PUBLIC HALL MANAGEMENT SYSTEM GRACE YAIT LINGGOU FACULTY OF COMPUTING AND INFORMATICS UNIVERSITI MALAYSIA SABAH

State of the MLS. John Chandler. President

Making housing finance markets work for the poor A perspective on the role of big data. Illana Melzer

jennifer conway Artificial Intelligence + Machine Learning: Current Applications in Real Estate

The Improved Net Rate Analysis

BUILDER SURVEY REPORT

PROPINSIGHT A Detailed Property Analysis Report

Technical Description of the Freddie Mac House Price Index

AVM Validation. Evaluating AVM performance

How Did Foreclosures Affect Property Values in Georgia School Districts?

Using rules for assessing and improving data quality: A case study for the Norwegian State of Estate report

Chapter 13. The Market Approach to Value

Over the past several years, home value estimates have been an issue of

Aalborg Universitet. CLIMA proceedings of the 12th REHVA World Congress volume 7 Heiselberg, Per Kvols. Publication date: 2016

Mass appraisal Educational offerings and Designation Requirements. designations provide a portable measurement of your capabilities

Certified Corporate Financial Planning & Analysis Professional (Cert FP&A): Preparation Course Part 1

The Market Watch Monthly Housing Report. Coachella Valley Median Detached Home Price Dec Dec 2016

PROPINSIGHT A Detailed Property Analysis Report

How Do We Live Skender Kosumi

Performance of the Private Rental Market in Northern Ireland

THE IMPACT OF RESIDENTIAL REAL ESTATE MARKET BY PROPERTY TAX Zhanshe Yang 1, a, Jing Shan 2,b

The Relationship Between Micro Spatial Conditions and Behaviour Problems in Housing Areas: A Case Study of Vandalism

Press Release. Commercial Real Estate - Digital Opportunities in a Shifting Industry

New Trends in Leasing Accounting

Mobile and Tablet. Ready

FEBRUARY Published March 25, 2016

Cook County Assessor s Office: 2019 North Triad Assessment. Norwood Park Residential Assessment Narrative March 11, 2019

ParcelMap BC. Compiling a Parcel Fabric for the Province of British Columbia. WENDY AMY and ELLEN STYNER

TABLE OF CONTENTS CHAPTER TITLE PAGE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK

Goods and Services Tax and Mortgage Costs of Australian Credit Unions

Application of Finite Difference Method to Develop Land Value Map

Cadastral Information System of Sofia

If you want even more information, look for the advanced training, which includes more use cases and demonstrates CU s full functionality.

Figure 1. The chart showing how the effort and cost of the design changes are affected as the project progresses (Anon.) Simulation tools are a key co

The accounting treatment of goodwill as stipulated by IFRS 3

BUILD-OUT ANALYSIS GRANTHAM, NEW HAMPSHIRE

Understanding the rentrestructuring. housing association target rents

Assessment Quality: Sales Ratio Analysis Update for Residential Properties in Indiana

Past & Present Adjustments & Parcel Count Section... 13

The Accuracy of Automated Valuation Models

LydianCoin Blockchain

Definitions ad valorem tax Adaptive Estimation Procedure (AEP) - additive model - adjustments - algorithm - amenities appraisal appraisal schedules

Introduction. Bruce Munneke, S.A.M.A. Washington County Assessor. 3 P a g e

The Desert Housing Report. Coachella Valley Median Detached Home Price March March 2019 $392,000 $415,000

Property valuation is an integral part of the housing industry

Ownership Data in Cadastral Information System of Sofia (CIS Sofia) from the Available Cadastral Map

RESO Standards Adoption Report

Uniform Residential Appraisal Report (URAR) Model Appraisal

MAAO Sales Ratio Committee 2013 Fall Conference Seminar

Property2chain Whitepaper

Real Estate Transaction Method And System

Benefit Transfer and Visitor Use Estimating Toolkit for Wildlife Recreation, Species and Habitat

G. Schennach / Chair Commission 7. Common Vision Conference 5-6 Oct 2017, Vienna, Austria

COLLABORATIVE DATA. We are living in a data rich age yet we lack the infrastructure to share this networked assets and support citizen science.

Scores for Valuation Reports: Appraisal Score & BPO Score. White Paper. White Paper APRIL 2012

Modelling a hedonic index for commercial properties in Berlin

PROPINSIGHT A Detailed Property Analysis Report

RAPID ANALYTICS INTERACTIVE SCENARIO EXPLORER (RAISE) A tool for analysing and visualising land valuation in different development scenarios

Symposium on: Impact Of Globalization On Indian Architecture Today INDO-WINDOW Vivek.V. Shankar Vivek Shankar Design Partnership

A Critical Study on Loans and Advances of Selected Public Sector Banks for Real Estate Development in India

Accounting Of Intangible Assets Indian as- 26

Design idea on planning skill training system of real estate development projects in colleges and universities

Can the coinsurance effect explain the diversification discount?

WHAT TO WATCH IN 2018 FOR THE HOUSING MARKET & PROPERTY MANAGEMENT INDUSTRY

Economy. Denmark Market Report Q Weak economic growth. Annual real GDP growth

The effect of atrium façade design on daylighting in atrium and its adjoining spaces

Collateral Underwriter Overview. National Association of REALTORS January 23, 2015

Housing affordability in England and Wales: 2018

Determinants of residential property valuation

The Proposal of Cadastral Value Determination Based on Artificial Intelligence

The Impact of Using. Market-Value to Replacement-Cost. Ratios on Housing Insurance in Toledo Neighborhoods

Coachella Valley Median Detached Home Price May May 2018

NEW TECHNOLOGIES & THEIR REAL ESTATE IMPACTS Dutch Treat Breakfast Meeting February 12, 2018 Taylor Mammen, Managing Director

Individual Property Report. Cambooya Toowoomba, QLD 4358, Australia

Can Big Data Increase our knowledge of the rental market? 1

Welcome Session warm welcome by BEV, celebrating 200 years of the Austrian cadastre CVC2017 supported by 7 organizations: WPLA EuroGeographics PCC EUL

Rambutan Road Report by Justin, HP: Property Summary Sheet Latest Avg PSF: $897 psf (Apr 10)

Seattle Housing Market Overview January 2019

Development of e-land Administration in Sweden

The Development of a Performance Assessment Model for Cadastral Survey Systems

Transcription:

PropTech for Proactive Pricing of Houses in Classified Advertisements in the Indian Real Estate Market Sayan Putatunda Member, IEEE Abstract Property Technology (PropTech) is the next big thing that is going to disrupt the real estate market. Nowadays, we see applications of Machine Learning (ML) and Artificial Intelligence (AI) in almost all the domains but for a long time the real estate industry was quite slow in adopting data science and machine learning for problem solving and improving their processes. However, things are changing quite fast as we see a lot of adoption of AI and ML in the US and European real estate markets. But the Indian real estate market has to catch-up a lot. This paper proposes a machine learning approach for solving the house price prediction problem in the classified advertisements. This study focuses on the Indian real estate market. We apply advanced machine learning algorithms such as Random forest, Gradient boosting and Artificial neural networks on a real world dataset and compare the performance of these methods. We find that the Random forest method is the best performer in terms of prediction accuracy. Keywords: PropTech, Machine learning, Real Estate, Housing Price, AI, Data science 1. Introduction Property Technology (PropTech) is the next big thing to disrupt the real estate market. PropTech espouses the use of technology to facilitate the management and operations of real estate assets [1]. The assets here mean either buildings or cities. PropTech can be seen as a part of digital transformation in the real estate industry and it focuses on both the technology and mentality changes of the people involved (including the

customers) in this industry [2]. PropTech can also lead to new functionalities such as more transparency, which were not possible earlier [3]. Big data, data analytics, machine learning, blockchain and sensors form a part of PropTech as well [3]. Most of the applications of PropTech documented in the literature are mainly in construction and real estate. One interesting use case is the Rental platforms and space as a service that matched supply and demand and work towards the demand for flexibility [3]. Other applications include Sales Platforms and webshops, digitization of administrative management and smart buildings [3]. Please refer to [3] for a detailed overview of various applications of PropTech. We already see a lot of investments flowing in the PropTech startups in US and European real estate markets [3]. Some of the most innovative PropTech firms across the globe are Atlant, Bowery, Buildium, Flip, Foyr, hom, Huthunt, No Agent, Opendoor, Open Listings, PeerStreet, Purplebricks, Ravti, Reposit, RealtyShares, Riley, Squarefoot and VR Listing [4]. As far as India is concerned, there is still lot of catch up to be done. One interesting startup that is using PropTech in the Indian context is Gharvalue. According to their website [5], they have built India s first user driven Automated Valuation Model (UAVM), which can help in performing realistic valuation of a property that can help the customers. As mentioned in their website, GharValue's UAVM will deliver an online housing value estimate based on proprietary models built on housing data, economic data, neighborhood data, amenities, parking, distances from transportation hubs, malls, schools and myriad other variables. [5]. There are some companies such as unissu.com that are connecting property with PropTech industries [6].

In this paper, we will work on the problem of proactive pricing of houses listed in classified website advertisements in the Indian real estate market as explained in details in Section 2. The rest of this paper is structured as follows. In the next section we will discuss the problem statement and a brief overview of the literature. Then in Section 3 we will discuss the dataset, which is followed by the Experimental results in the next section. Finally, Section 5 concludes the paper. 2. Problem Statement & Related Work In India, there are multiple real estate classified websites where properties are listed for sell/buy/rent purposes such as 99acres [7], housing [8], commonfloor [9], magicbricks [10] and more. However, in each of these websites we can see lot of inconsistencies in terms of pricing of an apartment and there are some cases when similar apartments are priced differently and thus there is lot of in-transparency. Sometimes the consumers may feel the pricing is not justified for a particular listed apartment but there no way to confirm that either. Proper and justified prices of properties can bring in a lot of transparency and trust back to the real estate industry, which is very important as for most consumers especially in India the transaction prices are quite high and addressing this issue will help both the customers and the real estate industry in the long run. We propose to use machine learning and artificial intelligence techniques to develop an algorithm that can predict housing prices based on certain input features. The business application of this algorithm is that classified websites can directly use this algorithm to predict prices of new properties that are going to be listed by taking some input variables and predicting the correct and justified price i.e. avoid taking

price inputs from customers and thus not letting any error creeping in the system. This study on proactive pricing of houses in the Indian context has never been reported earlier in the literature to the best of our knowledge. However, the problem of house price prediction is quite old and there have been many studies and competitions addressing the same including the classic Boston housing price challenge on Kaggle [11]. As far as housing price prediction in Indian context is concerned, [12] used machine learning techniques such as XGBoost for predicting housing prices in Bengaluru, India. MachineHack [13] conducted an hackathon on predicting housing prices in Bengaluru in 2018. The problem statement was to predict the price of houses in Bengaluru given 9 features such as area type, availability, location, price, size, society, total square foot, number of bathrooms and bedrooms. In fact, the experimental design of this study is motivated by [12] and [13]. Moreover, there have been other studies for house price prediction in other cities of India such as Mumbai [14]. 3. Data When someone visits a classified website such as commonfloor [9] to check apartments listed for sell, some of the information that are displayed are the Project name, region, type of apartment (2BHK, 3BHK and more), super built up area of the apartment, carpet area, number of bedrooms & bathrooms, Price of the apartment, parking availability (yes/no), direction facing, listed by (broker/individual), furnishing status (fully furnished/semi furnished/not furnished), facilities (gym, swimming pool, wifi, 24 hr electricity backup, etc.) and more information. In this paper, we have taken publicly available data manually from the website [9] (accessed on 25 th March, 2019), which in turn collates the listings data from various publicly available sources. The

total number of observations in the dataset is 66. 4. Experimental Results Our goal is to solve the regression problem where the target variable is the price and the independent variables are number of bedrooms, number of bathrooms, super built up area, carpet area, furnishing status, floor type, direction facing, type of apartment, region and facilities. We perform one-hot encoding on facilities i.e. create categorical variables for each facilities. For example the feature vector for Apartment 1 with gym, swimming pool, wifi but no cafeteria would be [1 1 1 0] whereas the feature vector for Apartment 2 with gym, no swimming pool, no wifi and no cafeteria would be [1 0 0 0]. The number of independent variables is 56. We divide the data into training and test in the ration of 4:1. While building the model on the training dataset, we use 5 fold cross validation for all the methods. All the experiments were run using the opensource software R and on a system with configurations 4 GB RAM Mac ios 1.6 GHz Intel Core i5. We then apply advanced machine learning algorithms such as Random forests and Gradient boosting [15]. In Random forest we use ntrees=5000 and refer to it as RF for the rest of this paper. For Gradient boosting ntrees=10000 and learning rate= 0.01 and we will refer to this method as GBM for the rest of this paper. We have also used Artificial neural networks [16] with different hyper-parameters. We use one design of neural network with 2 hidden layers and 100 hidden nodes in each, which we will refer as ANN1 for the rest of this paper. In ANN2, we use 3 hidden layers with 100 hidden nodes in each and finally in ANN3, we use 3 hidden layers with 200 hidden nodes in each. The activation function used is Rectifier. The evaluation metrics used for comparing the different methods are mean percentage error and median

percentage error on the test dataset. The reason for taking median as it is considered as a robust measure that is not affected by outliers. Table 1 shows the model results. We can clearly see that the RF i.e. Random forest method is the best performer in terms of prediction accuracy. Table 1: Model Results on the Test dataset Method Mean Percentage Error Median Percentage Error RF 27.48% 16.25% GBM 41.62% 32.77% ANN1 76.15% 65.43% ANN2 70.51% 52.56% ANN3 58.03% 44.36% Figure 1: Variable Importance Chart Figure 1 shows the variable importance plot for the RF method. We can see that the

furnishing status, carpet area, super built up area, type of apartment, region, floor type, number of bathrooms and bedrooms are the most important variables. 5. Conclusion In this paper, we have discussed the concept of PropTech and its applications. We worked on the use-case of proactive pricing of houses in classified advertisements in the Indian Real Estate market. We used advanced machine learning and artificial intelligence techniques to develop an algorithm that can predict housing prices based on certain input features. The business application of this algorithm is that classified websites can directly use this algorithm to predict prices of new properties that are going to be listed by taking some input variables and predicting the correct and justified price i.e. avoid taking price inputs from customers and thus keeping transparency in the system. The proposed methods can be used for other applications of PropTech as well as mentioned in Section 1. Since this was a proof of concept (POC), so the models were implemented on a smaller dataset. However, the error margins can be reduced further if we use much larger datasets, which we aim to work on in the future. References [1] Baum, A. 2017. PropTech 3.0: the future of real estate, University of Oxford Saïd Business School. [2] V. Lecamus, PropTech: What is it and how to address the new wave of real estate startups?, 2017. [Online]. Available: https://medium.com/@vincentlecamus/proptech-what-is-it-andhow-to-address-the-new-wave-of-real-estate-startups-ae9bb52fb128 [3] Doelin and Sante, ING Economics Department, PropTech: What is it and how to address the new wave of real estate startups?, 2018. [Online]. Available:

https://think.ing.com/uploads/reports/ing_ebz_proptech- Technlogy_in_the_real_estate_sector-June_2018_tcm162-148619.pdf [Accessed: 25-Mar- 2019]. [4] Go Weekly, The 20 most innovative companies in Real Estate (or PropTech), 2018. [Online]. Available: https://medium.com/go-weekly-blog/the-20-most-innovative-companies-in-realestate-or-proptech-2e0242b80e32 [5] Gharvalue, 2015-16. [Online]. Available: http://gharvalue.com/pages/avm [Accessed: 25-Mar- 2019]. [6] Unissu. [Online]. Available: https://www.unissu.com [7] 99acres. [Online]. Available: https://www.99acres.com [8] Housing. [Online]. Available: https://housing.com [9] Commonfloor. [Online]. Available: https://www.commonfloor.com/bangalore-property/forsale [10] Magicbricks. [Online]. Available: https://www.magicbricks.com [11] M. Risdal, Predicting House Prices Playground Competition: Winning Kernels, 2017. [Online]. Available: http://blog.kaggle.com/2017/03/29/predicting-house-prices-playgroundcompetition-winning-kernels/ [12] A. Nair, How To Use XGBoost To Predict Housing Prices In Bengaluru: A Practical Guide, 2019. [Online]. Available: https://www.analyticsindiamag.com/how-to-use-xgboost-to-predicthousing-prices-in-bengaluru-a-practical-guide/ [13] MachineHack, Predicting House Prices In Bengaluru, 2018. [Online]. Available: https://www.machinehack.com/course/predicting-house-prices-in-bengaluru/ [Accessed: 25- Mar-2019]. [14] Gandhi, S. (2012). Economics of Affordable Housing in Indian Cities: The Case of Mumbai. Environment and Urbanization Asia. [15] James, Witten, Hastie and Tibshirani, 2013. An Introduction to Statistical Learning with Applications in R, Springer-Verlag New York. [16] Bishop, 1995. Neural Networks for Pattern Recognition, Oxford University Press.