HomeSense Process Book

Basic Info

Project title - Owen Reinhart and Josh Sidelsky - 488365, 498648 - Github Link

Background and Motivation

We are graduating soon and the issue of where to live is an important question. It will be one of the first really long-term and potentially permanent choices we will have and will significantly affect the next stages of our lives so it would be incredibly useful to leverage data and visualizations to inform our decision.

Related Work

We were inspired by Niche a website that has a similar service of comparing colleges and Neighborhoods. Our project is more visual and map-based.

We were also inspired to use a map by the Chicago bikes leaflet studio.

Questions

The overall question is: What are the best places to live in the US?

a. This evolved into: What are the best places for college graduates/younger people to live? Because we added age factors that favored younger ages and considered rent factors significantly.

Some sub-questions that are a part of that overarching question are:

What are the cities with highest and lowest crime in the country?
What are the best places for renters and home-buyers in the country?
What cities have good commutes and working environments?
What are the most educated cities in the country?
What are the cities with the highest incomes?
What are the cities with the youngest ages? → This developed into cities with the ages closest to ours.

Data

The data for this project is collected from the 2022 American Community Survey from the US Census and the 2019 Uniformed Crime Report from the FBI. The data was broken down by Metro Statistical Area (MSA), which measure metropolitan areas across the US. The data was used to create indices to compare cities/MSAs. Below is a list and decsription of the data and index it informed.

Crime Data and Index
- Crime Data was sourced from the FBI Uniform Crime Report 2019, specifically covering the categories "Violent crime," "Murder and nonnegligent manslaughter," "Rape," "Robbery," "Aggravated assault," "Property crime," "Burglary," "Larceny-theft," and "Motor vehicle theft." This data is broken down by Metropolitan Statistical Areas (MSA). Unlike the census data not every MSA has crime data so we had to remove some cities. Apparently, this is because some police departments/cities/states have different rules/laws for reporting crimes, and did not report to the FBI.
- The Crime Index was a sum of all the crimes reported by the Uninform Crime Report.
Education Data and Index
- Data about the population's Highest Degree Attained is obtained from the American Community Survey 2022. It is broken down by MSA.
- The Education Index is a ranking of percentage of high school graduates in the MSA.
Income Data and Index
- Mean income data is obtained from the American Community Survey 2022 and is broken down by MSA.
- The Income Index is a ranking purely of mean income.
Cost of Living Data and Index
- Median and Mean Gross Rent by Bedroom data is collected from the American Community Survey 2022, also broken down by MSA. This dataset includes information about monthly housing costs, monthly housing costs as a percentage of income, and real estate taxes.
- Value of Houses with Mortgages data is from the American Community Survey 2022 and is also broken down by MSA. This dataset includes information on monthly housing costs, monthly housing costs as a percentage of income, and real estate taxes.
- The Cost of Living Index is a combination of Median Rent, Median Rent for 1 Bedroom, Mean Income, and Mean House Value. Where higher rents and house values are disfavored while higher incomes are favored. This works to ensure that all housing costs are relative to incomes. As well the 1 Bedroom rent is speficially added to favor what is a likely first house after college.
Age Data and Index
- Number of people by age is obtained from the American Community Survey 2022 and is broken down by MSA.
- The Age Index is a combination of rankings for similar ages (18-24), people 24+ and the average age of the MSA. It disfavors higher 24+ and higher average age and favors 18-24.
Worklife Data and Index
- Commute time, type and employment and unemployment data is obtained from the American Community Survey 2022 and is broken down by MSA.
- The Worklife Index is a combination of commute times, incomes, unemployment and employment rates and percent of people who walk to work

Exploratory Data Analysis and Design Evolution

Initially we started by deciding what factors were important to us to visualize and explore. We decided that income, cost of living and crimes were most important. We then explored the Census data for more sources we could include to help improve the search. There we decided to add ages and education and worklife. Also by exploring the census data we found the Metropolitan Statistical Area convention that we used to group the different geographic areas. This was convient as it was fairly granular and standard across a series of sources. We also considered non-census data from the department of agriculture (such as about farmers markets) and from private sources (about housing costs). However, often that data was not broken by MSA or had a lot of missing data or the Census had a good substitute. In our Exploratory stage we considered having state-based maps, as you can see below; however, we found this to not be germane or as informative as a user would likely want. Therefore, we ended up with a just MSA map.

Designs:

Design 1:

This design has a very simple layout with checkboxes along the top of the map view for the different categories and a side bar for search

Design 2:

This version features the option to switch between a grid view and combined view of the maps so you can compare the maps across categories

Design 3:

This version feature a side bar on the left that can be used for both search and filtering depending on which mode is selected. Filters are easily checked on or off and users can use sliders to control their search and see results listed below.

Design 4:

Final Design:

Above is the final design with all the MSAs. The three main visualizations are with the filters, the search function and the report card.

Implementation

Data Implementation:

Once we had gotten the tabluar data about Rent, Income, Crimes etc. from the Census and FBI we processed it with Python to create Dataframes with properties and MSA. Then we took a Shapefile of all the MSAs and converted it to a geojson (Shapefiles are typically used in mapping software in Geographic Information Systems, to convert to geojson we used ArcGIS). Then we merged the dataframes with the geojson in a geopandas geodataframe. As well we separated some auxiliary data that is displayed on the report card but not used for the filters. The remaining data in the geopandas dataframe was exported to geojson; below is a further explanation of the underlying geojson and json.

In order to speed up load times we have two underlying sources of data: Below is a rough sketch of the GeoJSON:

        {
            "type": "FeatureCollection",
            "crs": {
              "type": "name",
              "properties": { "name": "urn:ogc:def:crs:EPSG::4269" }
            },
            "features": [
              {
                "type": "Feature",
                "properties": {
                    "Key": "Abilene, TX Metro Area",
                    "Mean Income Rank": 142.0, 
                    "AgeRank": 169.0, 
                    "Education Rank": 168.0, 
                    "Overall CoL Rank": 159.0, 
                    "Worklife Rank": 29.0, 
                    "Crime Rank": 89.0 }
                },
                "geometry": {"yada yada ..."
              },
              ... // next MSA
            ]
          }

Then a second json is the auxiliary data json. This data contains other data about the MSA that is not the ranks but is on the report card. Below is an example of the data:

        {
            "Key": "Walla Walla, WA Metro Area",
            "State": "WA",
            "Total Pop": 61890.0,
            "MeanIncome": "86,907",
            "High school graduate (includes equivalency)": 0.1440458879,
            "Renter Cost Of Living Rank": 116.5,
            "Owner Cost Of Living Rank": 122.0,
            "Mean travel time to work (minutes)": 17.9,
            "Violent crime": 145.0,
            "Area Codes": 316
          }

Map Implementation:

The map implementation is done with d3, leaflet and jquery. Most of the map data visualization and handling is done with the WorldMap object (see object definition in map.js). Data is loaded in main with the loadData function and then passed to a function that instantiates the WorldMap object. The search function is also in main, it works by taking the user input and filters the MSAs out that do not meet the criteria and then updates the WorldMap object. card.js handles the report card. sidebar.js and modal.js handle the javascript for the filters and search areas (though the function of filtering is in main.js)

Evaluation

We learned a lot about the differences in metro areas around the country. It was really interesting to see the wide variety of differences in cities and what met expectations and what did not. For example the oldest cities are in Arizona and florida (to states known for people retiring there). As well some of the highest income MSAs were in Conneticut, California and DC. But overall it was interesting to just explore the map and see different cities and regions of the US and how myriad their factors were. Our visualization is fairly successful, it was nice to share it with some people we know and after our presentation one person said they were going to use it. The biggest improvements would be maybe more search funcationality and more data/more granular data. The MSA areas while fairly small relative to other stats are still pretty big areas. Though in the end we definitely answered our questions well with the visualization.