The impact of gentrification in NYC over time

By: Emme Wiederhold, David Huang, and David Lie-Tjauw

Link to Repository See Visualization

Overview and Motivation

Our initial motivation for this project came from our teammate David Lie-Tjauw, who spent last summer living in the heart of NYC. During his time there, he experienced NYC's problems with sky-high rent, gentrified neighborhoods, and bias towards the wealthy. This spurred him to bring his insights back to campus.

We believe that gentrification is an issue that can impact everybody, not just those who live in dense cities. Not only can it lead to higher costs of living, but it also has the power to negatively impact those who come from lower incomes and marginalized communities. In this project, we are specifically focusing on NYC because of their vast amount of open, high-quality data that they have collected over the years. It’s no debate that gentrification is transforming New York City everyday. The question, however, is whether or not gentrification is hurting or improving the lives of the city’s 8 million inhabitants. This question has been the source of endless debate as city officials try to decide on the best urban legislation for their constituents. To add productive commentary to this debate, we aim to show the impact of gentrification based on different key data points, such as income, demographics, poverty, and population. This project will serve to educate the common public about gentrification and hopefully inspire them to understand the impact of gentrification in their own communities.

Related Work

One project that inspired our visualization was the work from the NYU Furman Center, whose aim is to advance research and debate on housing, neighborhoods, and urban policy. They created CoreData.NYC, which is a platform that makes housing and neighborhood data from NYC availavble to the public. Particularly, the CoreData.nyc website allows users to not only view and download datasets, but they also provide a visual mapping of NYC that colors the different subboroughs different shades of colors depending on the values of whatever attributes (ex. median income, population, etc.) a user is looking at. Drawing from this visualization, we felt that the website might even provide TOO MUCH information to the common everyday visitor and that this could be simplified more. This also motivated us to figure out how to use a map of NYC to illustrate which subboroughs were experiencing more gentrification compared to others. We later came up with the idea to calculate "gentrification scores" to show this.

Another project we drew inspiration from was a visualization made by the City of Los Angeles called "Los Angeles Index of Neighborhood Change". Using publicly collected city data, the project colors the neighborhoods of LA different shades of color, depending on how much neighborhood change is occurring. Naturally, we wanted to learn more, so we contacted Alex Pudlin of the City of LA who sent us the team's whitepaper that describes how they aggregated gentrification data to color encode a map of LA. According to the whitepaper, the "scores" are an aggregate of six demographic measures indicative of gentrification. We used this as inspiration for calculating "gentrification scores" for our visualization.

The Questions Our Project Aims to Answer

Questions before our project

New questions we considered throughout our project:

What we want to accomplish

Benefits

Data

Our data comes from the NYU Furman Center, which "advances research and debate on housing, neighborhoods, and urban policy". They created CoreData.nyc to make housing and neighborhood data available to the public. Below, you will find the specific datasets we downloaded:

In reality, there are many contributing factors for gentrification, and there is not usually any one sole reason for why a neighborhood has become gentrified. That being said, we spent time researching gentrification and its effects in order to identify "indicators" of gentrification that are often associated with the rise of gentrification. These are our main sources:

Data Processing

Even though most of our data comes from the same source, different pieces of data are found in separate files from each other throughout CoreData.nyc. Data processing will be implemented by individually parsing the separate files and combining the relevant data into one monolithic data source in order to make it easier for us to work with the data in D3. This is done using python.

In our project, we encode the map of NYC with different shades of color, where darker colors are meant to indicate more gentrification happening compared to lighter colored subboroughs. These colors are chosen based off the "gentrification score" of whatever attribute(s) are selected. In order to calculate the gentrification score for an attribute, we first normalize the values of each attribute for each year so that its values would fall within [0,1]. We normalized values according to this equation:

where X is the value for a specific year being normalized, X_min is the smallest attribute value throughout all years and subboroughs, X_max is the largest attribute value throughout all years and subboroughs, and X' is the attribute's normalized value for a specific year. To determine the gentrification score from a single attribute, we then sum up all the attribute's normalized values throughout all available years. When two attributes are selected, we simply sum up the gentrification scores of both attributes. This value is then applied to our color scale so that lower gentrification scores for a subborough will cause its coloring to be lighter than another subborough with a higher gentrification score.

To see our cleaned data, click here.

To see our jupyter notebook used to parse and clean the data, click here

Exploratory Data Analysis

Initially, we used NYU Furman Center's CoreData.nyc website to look at the available data about NYC. Like we described earlier, the website shows a map of NYC and colors the different subboroughs different shades of color depending on what data we are looking at. This largely motivated us to also incorporate an interactive map of NYC in our visualization, as it provided the right functionality needed for users to analyze data from specific subboroughs.

Visualization Design

Must-Have Features:

Optional Features:

Initial Designs

Project Schedule

Milestone One

So far, we have been able to stick to our original 3-panel design. Users can select a gentrification attribute from the sidepanel and then select a specific NYC neighborhood in order to display a timeseries datachart of whatever data is selected. Currently, our timeseries chart just shows the income data for each subborough, but we plan on dynamically changing this data in the next milestone. One design idea we are currently experimenting with is to generate a "gentrification score" from multiple selected attributes and apply this score to a color scale that can illustrate which neighborhoods in NYC are exhibiting the highest amounts of gentrification. In the picture above, for example, a gentrification score is generated for Median Household Income and Poverty Rate. We are still researching ways in which we can make a reliable gentrification score that makes the most sense to users, so currently we are just averaging the values of whatever metrics are selected. One feature that we weren't yet able to implement was the timeline feature that allows users to explore NYC data across multiple years. We hope to have this feature in the next iteration.

Milestone Two

By now, we have implemented all of our core goals we stated that we would achieve at the beginning of this project. Our visualization consists of four main features that work together to help users look at trends in gentrification. Users begin by selecting up to two gentrification attributes from the two dropdown menus at the top of our visualization. After selecting up to two subboroughs from the GeoJSON map below, data from 2005-2016 will be displayed in bar charts to the right of the map. Users can change the year for the data displayed using a slider control. For the bar chart visualization, the top two quadrants will represent the data for a specific subborough A, while the bottom two quadrants will represent the other selected Subborough B.

The purpose of the GeoJSON map is not just to allow users to select specififc subboroughs they want to investigate, but also to help users see how some subboroughs might be more gentrified than others. We do this by normalizing the dataset from NYU and then adding these normalized values together to generate a "gentrification score", where a higher score means that there is more gentrification in an area. For example, if a user has selected the attributes "Median Household Income" and "Median Rent" for the year 2009, then our visual will get the appropriate normalized values for each attribute and add them together. The resulting sum will then be applied to a color scale and used to color the each subborough on the map. Darker colors will mean higher sums and more gentrification happening. It is worth clarifiying that not all attributes in our visualization are positively correlated to a rise in gentrification. Consider Housing Units, for example, where a lower supply of housing in a given area can actually cause increased gentrification because of rising rent in response. In cases like these, we simply take the complement of the attribute's normalized value.

The idea for normalizing values for data to compute a "gentrification score" of sorts is largely attributed to a visulalization project from the City of Los Angeles that also attempts to take different gentrification data and use it to color encode a map of LA. The name of the visual is: Los Angeles Index of Neighborhood Change:

Naturally, we wanted to learn more, so we contacted Alex Pudlin of the Los Angeles Innovation Team who sent us the team's whitepaper that describes how they aggregated gentrification data to color encode a map of LA. According to the whitepaper, the "scores" are an aggregate of six demographic measures indicative of gentrification, similar to how our visual's gentrification attributes. The measures are then standardized and combined using weights that reflect the proportion of each measure that is statistically significant. In our case, we did not use weighting as we were constrained in time and knowledge about how to accurately weight certain attributes.

Originally, we had wanted to create a timeseries line chart that displays the data for multiple attributes. We were unable to follow through with this idea; however, as different attributes had different units of measure (ex. dollars vs housing units). This led to inconsistent y-axes being constructed. A potential solution we explored was a timeseries chart with two y-axes, each of different measure:

This idea did not work however because this could lead to users identifying faulty trends in data that never existed in the first place. Our alternative solution, which ended up being better, is the 4-quadrant bar chart you see now. Users are still able to compare multiple attributes across different subboroughs, all while addressing the issue of different attributes having different units of measure. Based on the above screenshot, users can easily compare the Median Household Income of Starrett City and Borough Park, while at the same time also being able to look at the data concerning the population of both subboroughs. Because we keep the axes for each attribute seperate, users will not accidentially mis-identify trends in our data, compared to our previous line chart idea.

Implementation Breakdown

Select Gentrification Attributes

Originally, we wanted to have checkboxes that users could select multiple attributes from. However, this idea presented many design issues. During milestone 2, we determined that using line charts wouldn't be feasible as it could mislead users and could only support up to two different y-axes. With out new bar chart design, we still could only support up to 2 attrbutes at a time while maintaining a simple design. This prompted us to implement two drop down menus instead where users can select up to two attributes at a time. Users also have the option to select "No attribute selected" if they only want to analyze on attribute at a time.

Map of NYC

There are three main functions for our Map of NYC feature. The first function can be accessed through the year-slider, which makes the coloring of the map to display attribute data from different years. The second function is the coloring of the map itself. By calculating gentrification scores, we are able to color encode the map with different shades of coloring. This is meant to show which subboroughs are possibly more gentrified than others. The last function is that users can select up to 2 subboroughs from this map which will then generate up to 2 bar chart visuals that show the data of the selected attributes.

Bar Charts

Originally, we had wanted to create a timeseries linechart that overlayed different lines for different selected attributes. But after consulting with Prof. Ottley, we determined that this would not work as different attributes had different units. Additionally, we chose not to implement a line chart with two y-axes because this could potentially be misleading to the user. This inspired our 2x2 bar chart visualization where the top half and bottom half represent data for different subboroughs. The left-column and right-column represent different attributes selected. As users hover over certain bars, they will be shown the specific value associated with that bar. We believe this bar chart visual accomplishes our goal of not only allowing users to compare subboruoughs and analyze specific data from specific years, but it also does so in a clean and non-confusing manner.

User Studies

User Tasks

Our user studies were split into four sections:

  1. Spend the first 5 minutes having the user explore the website without giving them any context of background info whatsoever. During this time, do not give the user any hints even if they ask.
  2. Ask the following questions:
    • Does subborough X become more or less gentrified over time base on attribute Y?
    • In Year X, does subborough A or subborough B have more attribute Y?
    • In Year X, based on attribute A and attribute B, is subborough A or subborough B more gentrified?
  3. After the user has finished using the visualization, ask the following questions:
    • What do you like about this visualization?
    • What do you not like about this visualization? And how would you fix it?
    • What questions do you have about this visual?
    • Is it clear how the subboroughs are colored?
  4. Record observations and offer solutions to any pain points participants may have

Feedback

Overall feedback was positive, but here were some criticisms our users had:

Refining visualization based on feedback

In response to our feedback, we made the following improvements:

Evaluation

Answering Questions

  1. How has gentrification changed in NYC over time?
  2. How can we visualize gentrification in NYC?
  3. Does gentrification have a positive or negative impact on NYC over time?
  4. How did the 2008 recession affect NYC?
  5. What attributes might best reflect the presence of gentrification in a subborough?
    • This question was considered very similar to: What are some metrics that might have been impacted by gentrification?
    • To understand how Gentrification is affecting New York City, we have selected 6 attributes that are strong indicators of Gentrification. Below, is our justification for each attribute:

      • Housing Units: A limited housing supply for a growing population often results in rising rent. In turn this affects low-income households who are the least likely to keep up with the rising costs of living in the city.
      • Mean Travel Time: Long commute times could be reasoning for people to move from the suburbs to the city in order to shorten the time they spend everyday going to and from work.
      • Median Household Income: Higher household income is a consequence of having more affluent people moving into a specific area. This causes consumer preferences for an area to shift upwards, inviting more businesses that cater to the wealthy.
      • Median Rent: Higher rent is a leading cause for low-income families to move out of an area.
      • Population: Growing populations are an overall indicator of gentrification because there is likely to be a reason for why large amounts of people would want to move into an already densely-packed area like NYC.
      • Racial Diversity Index: The RDI is the probability that two randomly chosen people in a given geographic area will be of a different race. Decreasing levels of diversity in a neighborhood have historically been associated with a rise in gentrification.

Reflection on visualization

One of the biggest takeaways of this project has been learning about how complicated gentrification is. Given this, we feel that our visualization excels in breaking this down through our simple UI that allows users to instantly view and analyze data without any excessive clicking or scrolling. We also provide visual stimuli by coloring the map to show trends in possible gentrification that can’t be gleaned from rows of numerical data.

One thing that we could have improved

An area of concern is subborough selection. Users are able to click and choose up to two subboroughs using the map. Unfortunately, because the users must use the map to select the subborough, they are unable to keep one subborough fixed while changing another. This behavior is due to limitations of the map visualization. To keep our project visually appealing, we decided to sacrifice a bit of usability to keep the display from getting cluttered up by listing. Alternatives to using the map could've been a dropdown menu, or a list of buttons. However, listing all of New York City's 55 subborough's on the visualization could overwhelm the user and worsen the UI more than the map.

What did you learn about your data?

Our biggest lesson was that sometimes data isn't meant to offer definitive yes/no answers. Many times it's there to help you understand the story behind it. This was evident when we realized that no amount of data could really tell us a yes/no answer on whether gentrification was having a positive or negative effect on NYC. What we did end up doing, however, was learn that each set of data we collected would somehow influence or influence another set of data. This then spurred us to see our visualization as a way to systematically illustrate these relationships between different attributes.

Screencast