Here, in the current project we are having Dataset of mainly two types: 1) Land data data about properties, structures, and their characteristics. 2) People data data about people, their demographics, and their relationship to different areas. These data are further divided into several sub- categories and kinds. 1) Land Data Raw data Sources: Public FTP sites Data Type Based on Content: We have three types of data surrounding parcels for Cuyahoga County: 1. Property data this comes from the county CAMA system. It is a current snapshot of basic ownership and land use information. (GeoJSON or CSV) in Collection in MongoDB Meta Data: Neha s email for attribute sets below 2. Sales data this comes from Charlie Post in the Urban College and from the County FTP site. This gives more detailed information on the conditions of sales from 1976 to near the present. I think 2014/2015, the rest of the more recent data is on the FTP site. (GeoJSON or CSV) 3. Characteristics data this comes from tax assessments every six years. I do have documentation for this but it is spotty and not very descriptive. This is detailed information on structures by floor for the entire county. Number of elevators, bathrooms, commercial/residential square footage, etc. Very detailed. (Not Cleaned) Data Description for Each Data Type 1. Property Data (Land data) Description Less Updated data Heta and Neha have access to the JSON of property data. It is around a gigabyte. I have it on a thumb drive and can drop if off if you give me a building/office number. Are you on campus? I ll be here until ~3 today. If not they are I can get it to you some other time. The raw data comes from the Cuyahoga County GIS Department FTP I will forward the login info it is an ESRI Shapefile (Visualized by API: Leaflet.js (Used), Google Map API, Mapbox, D3) GeoJSON is a format for encoding a variety of geographic data structures. { "type": "Feature", "geometry": {
"type":"point", "coordinates":[125.6, 10.1] }, "properties":{ "name":"dinagat Islands" } } GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon. Geometric objects with additional properties are Feature objects. Sets of features are contained by FeatureCollection objects. Meta Data of Property data: Fields: PARCELPIN - Unique identifier for property parcels TRANSFER_D - Date of the last transfer of ownership SALES_AMOU - Sale amount at last transfer of ownership PAR_ADDR - address of parcel PAR_PREDIR - ^ same PAR_STREET - ^ same PAR_SUFFIX - ^ same PAR_CITY - ^ same PAR_ZIP - ^ same PAR_ADDR_A - ^ same PAR_UNIT - ^ same MAIL_NAME - mailing address of current owner MAIL_ADDR_ - ^ same MAIL_CITY - ^ same
MAIL_STATE - ^ same MAIL_ZIP - ^ same MAIL_COUNT - ^ same MAIL_UNIT - ^ same TAX_LUC - land use code for TAXABLE land uses TAX_LUC_DE - land use description for TAXABLE land uses GCERT3 - assessed value of land and structure TOTAL_COM_ - total square feet of commercial space under roof TOTAL_ACRE - total acreage of parcel SiteCat1** - simplified land use categorization, major category SiteCat2** - simplified land use categorization, minor category Units2 - estimated living/dwelling units PARCL_OWN3 - cleaned owner name PAREN2* - parent parcel number EXT_LUC - land use code for TAX EXEMPT land uses EXT_LUC_DE - land use description for TAXABLE land uses *PARCELPIN is unique. Some properties are merged or split over time, PAREN2 tracks that. If a parcel is not part of a larger grouping it will have a unique PAREN2 value. If a parcel has been joined to other parcels it will share a PAREN2 value with them. Parcels that share a PAREN2 value are effectively one parcel, but listed independently to help track their past status. **Properties are grouped into SiteCat1 & SiteCat2 based on their EXT_LUC and TAX_LUC values. The breakdown is listed in the file codes_for_dev.csv Raw data Sources: The Property Data comes from the portal/ftp in form of CAMA files. You can also get the file on their data portal (http://data-cuyahoga.opendata.arcgis.com/datasets?t=property%20and%20use).
Although I just checked and for some reason they ve split it into Cleveland/Non-Cleveland. Shapefiles have a ~2gb size limit, so maybe they exceeded it. They rarely update the OpenData site, but currently it has the most up to date version. The FTP really is the best resource and is a treasure trove of related tax and infrastructure data They have a REST API, but I think you need to request access. They generally don t like to share their toys and I wouldn t really know what to do with access anyway so I never tried. There really isn t much documentation of the data available from the County. Most of what we know is through trial and error and asking specific questions about specific variables directly to the County. We clean the ownership fields (parcel_own and deeded_own) using OpenRefine. This removes a number of spelling errors and multiple names for the same owner (CITY OF CLEVELAND versus CLEVELAND CITY OF). We create an estimate of household units and add a land use categorization based on the tax codes found in the tax_luc and ext_luc fields. I have an R script I use to do some cleaning and to generate our variables. I will package it up with supporting CSV files. 2. Sales Data This came on a CD and I can drop that on the USB. It s a bit messy I also have a cleaner version I will include as CSV. 3. Characteristic Data We haven t touched this in a while. I need to find where I have it saved and will get that to you as well, with available documentation. Land data is based mainly on land-use of the particular property. Land is divided into parcels of land. Land data is divided mainly by the taxable land use: Residential, Commercial, Industrial, etc. Each taxable land use category has sub-categories that further refine the activities that take place there: single-family residential, commercial food service, heavy manufacturing, etc. All parcels contain ownership data. Residential and Residential Mixed-Use (apartments above shops, for example) contain estimates of living units (also called dwelling units). Commercial and Residential/Commercial Mixed-Use (apartments above shops as well as a building with a variety of uses on different floors) as well as Industrial parcels contain information on square footage (the usable space under roof not just the size of the parcel of land). Data also exists concerning transactions: prior and current ownership, conveyance values, conveyance instruments, dates, etc.
2) People Data Raw data Sources: LODES data can be gather from https://lehd.ces.census.gov/data/lodes/ Meta data Description is in https://lehd.ces.census.gov/data/ For example: For in Index of /data/lodes/lodes5/ak/od See the section: LEHD Origin-Destination Employment Statistics (LODES) Data Description People data is mainly of LODES (LEHD Origin-Destination Employment Statistics) kind. These dataset describes RAC (Residential Area Characteristics), WAC (Work Area Characteristics), OD (Origin- Destination). For people who are living and working in Cleveland: Cleveland working people s demographic RAC Jobs data for residents in a particular area, regardless of if the resident works in the same area or a different area. This includes demographic data on age, income, education level, race, ethnicity, industry of work, and biological sex. WAC Jobs data for workers in a particular area, regardless of if the worker lives in the same area or a different area. This includes demographic data on age, income, education level, race, ethnicity, industry of work, and biological sex. OD RAC + WAC for each worker in Cleveland. A matrix of the relationship between workers home block and their work block, with reduced demographic details. And, what are focusing on MVP is mainly dataset of Cuyahoga County (~1.3 GB size), where these data is of LODES and Parcel Data. Further, the size of data will increase more than ~3 GB. How to access these data? The Property Data comes from the portal/ftp in form of CAMA files. The Sales Data comes from the Fiscal Officer FTP & older historic data from Charlie Post at the Urban College. Also, the Characteristics Data comes from Charlie Post via someone at the County and also from tax assessments every six year.