Skip to main content
Affordable Housing , Community Development , Homelessness , Housing Policy Briefs , Supportive Housing

Big Data and Public Data: Understanding the Source

6 March 2018
CT Data Collaborative

Michelle Riordan-Nold, Executive Director at the CT Data Collaborative 

Often it feels as if we are almost drowning in data, since we now have so much available to us right at our fingertips. Think about all the data that feed and power something like the Waze app, for example, where data are crowd-sourced by many, many users. Zillow is another example of a service that is both using and producing huge amounts of data. Zillow collects publicly available data from hundreds of sources but also extracts data from property owners themselves. 

In many areas, data come from a variety of sources and housing is no different. Teasing out the different methodologies and collection methods for the data sources can be a challenge. Recently, a town municipal employee was looking at data on and was curious about the source for ‘housing values.’ The value listed for her town was different than what the town assessor reports as the median home value. Most of the housing data we provide at come from the U.S. Census American Community Survey (ACS) 5-year estimates. But the real question is how are those data ascertained? As it turns out, the question that survey respondents are asked on the ACS is “How much do you think this house and lot, apartment, or mobile home (and lot, if owned) would sell for if it were for sale?” 

That raised alarms in my mind about the validity of the value. People notoriously overestimate the value of their home. Given the difficulty in estimating the value of your house, I was interested in validating it against other sources of data. I compared Zillow data to Census data, and in all but four towns in Connecticut, the Census values were higher than the Zillow values.

Zillow data are not perfect either. Zillow uses a proprietary database that consists of information culled from prior home sales, county records, tax assessments, real estate listings, mortgage information, geographic information system data, and input from homeowners. The company then ascribes Zestimates to the value of 110 million homes in the U.S. which it updates daily. These values are all part of the database which is used to calculate the median value by home type, price tier, and region. However, there are nuances that occur in the market that can impact the sale of a home and thus are not representative in the publicly recorded sale price. In any case, the Zillow data are similar to the Census data in that these are estimates and a starting point for discussion.

So yes, although we are inundated with data, from an analysis perspective, it’s important to use a variety of sources and recognize each sources’ strengths and weaknesses. I look forward to participating on the panel at the March 19th IForum and discussing the many sources of housing data. On March 19th, CTData will also be releasing a housing portal that provides data from Zillow, the Census, and also links to the Partnership’s Housing Data Profiles, in hopes of providing users with a way to compare sources and understand housing data a little better.

Click here to read previous blogs.