Our initial motivation for this project came from our teammate David Lie-Tjauw, who spent last summer living in the heart of NYC. During his time there, he experienced NYC's problems with sky-high rent, gentrified neighborhoods, and bias towards the wealthy. This spurred him to bring his insights back to campus.
We believe that gentrification is an issue that can impact everybody, not just those who live in dense cities. Not only can it lead to higher costs of living, but it also has the power to negatively impact those who come from lower incomes and marginalized communities. In this project, we are specifically focusing on NYC because of their vast amount of open, high-quality data that they have collected over the years. It’s no debate that gentrification is transforming New York City everyday. The question, however, is whether or not gentrification is hurting or improving the lives of the city’s 8 million inhabitants. This question has been the source of endless debate as city officials try to decide on the best urban legislation for their constituents. To add productive commentary to this debate, we aim to show the impact of gentrification based on different key data points, such as income, demographics, poverty, and population. This project will serve to educate the common public about gentrification and hopefully inspire them to understand the impact of gentrification in their own communities.
One project that inspired our visualization was the work from the NYU Furman Center, whose aim is to advance research and debate on housing, neighborhoods, and urban policy. They created CoreData.NYC, which is a platform that makes housing and neighborhood data from NYC availavble to the public. Particularly, the CoreData.nyc website allows users to not only view and download datasets, but they also provide a visual mapping of NYC that colors the different subboroughs different shades of colors depending on the values of whatever attributes (ex. median income, population, etc.) a user is looking at. Drawing from this visualization, we felt that the website might even provide TOO MUCH information to the common everyday visitor and that this could be simplified more. This also motivated us to figure out how to use a map of NYC to illustrate which subboroughs were experiencing more gentrification compared to others. We later came up with the idea to calculate "gentrification scores" to show this.
Another project we drew inspiration from was a visualization made by the City of Los Angeles called "Los Angeles Index of Neighborhood Change". Using publicly collected city data, the project colors the neighborhoods of LA different shades of color, depending on how much neighborhood change is occurring. Naturally, we wanted to learn more, so we contacted Alex Pudlin of the City of LA who sent us the team's whitepaper that describes how they aggregated gentrification data to color encode a map of LA. According to the whitepaper, the "scores" are an aggregate of six demographic measures indicative of gentrification. We used this as inspiration for calculating "gentrification scores" for our visualization.
In reality, there are many contributing factors for gentrification, and there is not usually any one sole reason for why a neighborhood has become gentrified. That being said, we spent time researching gentrification and its effects in order to identify "indicators" of gentrification that are often associated with the rise of gentrification. These are our main sources:
Even though most of our data comes from the same source, different pieces of data are found in separate files from each other throughout CoreData.nyc. Data processing will be implemented by individually parsing the separate files and combining the relevant data into one monolithic data source in order to make it easier for us to work with the data in D3. This is done using python.
In our project, we encode the map of NYC with different shades of color, where darker colors are meant to indicate
more gentrification happening compared to lighter colored subboroughs. These colors are chosen based off the
"gentrification score" of whatever attribute(s) are selected. In order to calculate the gentrification score
for an attribute, we
first normalize the values of each attribute for each year so
that its values
would
fall within [0,1]. We normalized values according to this equation:
where X is the value for a specific year being normalized, X_min is the smallest attribute value
throughout all years
and subboroughs, X_max is the largest attribute value throughout all years and subboroughs, and X'
is the attribute's normalized value for a specific year.
To determine the gentrification score from a single attribute, we then sum up all the
attribute's normalized values throughout all available years. When two attributes are selected, we simply sum up
the gentrification scores of both attributes. This value is then applied to our color scale so that lower
gentrification scores for a subborough will cause its coloring to be lighter than another subborough with a higher
gentrification score.
To see our cleaned data, click here.
To see our jupyter notebook used to parse and clean the data, click here
Initially, we used NYU Furman Center's CoreData.nyc website to look at the available data about NYC. Like we described earlier, the website shows a map of NYC and colors the different subboroughs different shades of color depending on what data we are looking at. This largely motivated us to also incorporate an interactive map of NYC in our visualization, as it provided the right functionality needed for users to analyze data from specific subboroughs.
Must-Have Features:
Optional Features:
So far, we have been able to stick to our original 3-panel design. Users can select a gentrification attribute from the sidepanel and then select a specific NYC neighborhood in order to display a timeseries datachart of whatever data is selected. Currently, our timeseries chart just shows the income data for each subborough, but we plan on dynamically changing this data in the next milestone. One design idea we are currently experimenting with is to generate a "gentrification score" from multiple selected attributes and apply this score to a color scale that can illustrate which neighborhoods in NYC are exhibiting the highest amounts of gentrification. In the picture above, for example, a gentrification score is generated for Median Household Income and Poverty Rate. We are still researching ways in which we can make a reliable gentrification score that makes the most sense to users, so currently we are just averaging the values of whatever metrics are selected. One feature that we weren't yet able to implement was the timeline feature that allows users to explore NYC data across multiple years. We hope to have this feature in the next iteration.
By now, we have implemented all of our core goals we stated that we would achieve at the beginning of this project. Our visualization consists of four main features that work together to help users look at trends in gentrification. Users begin by selecting up to two gentrification attributes from the two dropdown menus at the top of our visualization. After selecting up to two subboroughs from the GeoJSON map below, data from 2005-2016 will be displayed in bar charts to the right of the map. Users can change the year for the data displayed using a slider control. For the bar chart visualization, the top two quadrants will represent the data for a specific subborough A, while the bottom two quadrants will represent the other selected Subborough B.
The purpose of the GeoJSON map is not just to allow users to select specififc subboroughs they want to investigate, but also to help users see how some subboroughs might be more gentrified than others. We do this by normalizing the dataset from NYU and then adding these normalized values together to generate a "gentrification score", where a higher score means that there is more gentrification in an area. For example, if a user has selected the attributes "Median Household Income" and "Median Rent" for the year 2009, then our visual will get the appropriate normalized values for each attribute and add them together. The resulting sum will then be applied to a color scale and used to color the each subborough on the map. Darker colors will mean higher sums and more gentrification happening. It is worth clarifiying that not all attributes in our visualization are positively correlated to a rise in gentrification. Consider Housing Units, for example, where a lower supply of housing in a given area can actually cause increased gentrification because of rising rent in response. In cases like these, we simply take the complement of the attribute's normalized value.
The idea for normalizing values for data to compute a "gentrification score" of sorts is largely attributed to a visulalization project from the City of Los Angeles that also attempts to take different gentrification data and use it to color encode a map of LA. The name of the visual is: Los Angeles Index of Neighborhood Change:
Naturally, we wanted to learn more, so we contacted Alex Pudlin of the Los Angeles Innovation Team who sent us the team's whitepaper that describes how they aggregated gentrification data to color encode a map of LA. According to the whitepaper, the "scores" are an aggregate of six demographic measures indicative of gentrification, similar to how our visual's gentrification attributes. The measures are then standardized and combined using weights that reflect the proportion of each measure that is statistically significant. In our case, we did not use weighting as we were constrained in time and knowledge about how to accurately weight certain attributes.
Originally, we had wanted to create a timeseries line chart that displays the data for multiple attributes. We were unable to follow through with this idea; however, as different attributes had different units of measure (ex. dollars vs housing units). This led to inconsistent y-axes being constructed. A potential solution we explored was a timeseries chart with two y-axes, each of different measure:
This idea did not work however because this could lead to users identifying faulty trends in data that never existed in the first place. Our alternative solution, which ended up being better, is the 4-quadrant bar chart you see now. Users are still able to compare multiple attributes across different subboroughs, all while addressing the issue of different attributes having different units of measure. Based on the above screenshot, users can easily compare the Median Household Income of Starrett City and Borough Park, while at the same time also being able to look at the data concerning the population of both subboroughs. Because we keep the axes for each attribute seperate, users will not accidentially mis-identify trends in our data, compared to our previous line chart idea.
Originally, we wanted to have checkboxes that users could select multiple attributes from. However, this idea presented many design issues. During milestone 2, we determined that using line charts wouldn't be feasible as it could mislead users and could only support up to two different y-axes. With out new bar chart design, we still could only support up to 2 attrbutes at a time while maintaining a simple design. This prompted us to implement two drop down menus instead where users can select up to two attributes at a time. Users also have the option to select "No attribute selected" if they only want to analyze on attribute at a time.
There are three main functions for our Map of NYC feature. The first function can be accessed through the year-slider, which makes the coloring of the map to display attribute data from different years. The second function is the coloring of the map itself. By calculating gentrification scores, we are able to color encode the map with different shades of coloring. This is meant to show which subboroughs are possibly more gentrified than others. The last function is that users can select up to 2 subboroughs from this map which will then generate up to 2 bar chart visuals that show the data of the selected attributes.
Originally, we had wanted to create a timeseries linechart that overlayed different lines for different selected attributes. But after consulting with Prof. Ottley, we determined that this would not work as different attributes had different units. Additionally, we chose not to implement a line chart with two y-axes because this could potentially be misleading to the user. This inspired our 2x2 bar chart visualization where the top half and bottom half represent data for different subboroughs. The left-column and right-column represent different attributes selected. As users hover over certain bars, they will be shown the specific value associated with that bar. We believe this bar chart visual accomplishes our goal of not only allowing users to compare subboruoughs and analyze specific data from specific years, but it also does so in a clean and non-confusing manner.
Overall feedback was positive, but here were some criticisms our users had:
In response to our feedback, we made the following improvements:
Overall, we learned that a negative impact of gentrification may be complimented by a positive impact as well. Take median household income, for example. In some subboroughs, like Chelsea/Clinton/Midtown and Brooklyn Heights/Fort Greene, we saw Median Household Income increase overall during 2005-2016. On one hand this can be seen as a "good" impact in that residents in these neighborhoods will tend to be making more money now as opposed to before. However, in these same subboroughs we also saw evidence of rising rent across the same time interval, which indicates a higher cost of living for citizens.
Though we would have liked to had a definitive answer to this question, it's now obvious that such an answer doesn't exist. We've learned that the best we can do is to shine light on what's actually happening so that the users of our visual learn that positive impacts of gentrification can also be accompanied by negative ones as well and vice versa.
To understand how Gentrification is affecting New York City, we have selected 6 attributes that are strong indicators of Gentrification. Below, is our justification for each attribute:
One of the biggest takeaways of this project has been learning about how complicated gentrification is. Given this, we feel that our visualization excels in breaking this down through our simple UI that allows users to instantly view and analyze data without any excessive clicking or scrolling. We also provide visual stimuli by coloring the map to show trends in possible gentrification that can’t be gleaned from rows of numerical data.
An area of concern is subborough selection. Users are able to click and choose up to two subboroughs using the map. Unfortunately, because the users must use the map to select the subborough, they are unable to keep one subborough fixed while changing another. This behavior is due to limitations of the map visualization. To keep our project visually appealing, we decided to sacrifice a bit of usability to keep the display from getting cluttered up by listing. Alternatives to using the map could've been a dropdown menu, or a list of buttons. However, listing all of New York City's 55 subborough's on the visualization could overwhelm the user and worsen the UI more than the map.
Our biggest lesson was that sometimes data isn't meant to offer definitive yes/no answers. Many times it's there to help you understand the story behind it. This was evident when we realized that no amount of data could really tell us a yes/no answer on whether gentrification was having a positive or negative effect on NYC. What we did end up doing, however, was learn that each set of data we collected would somehow influence or influence another set of data. This then spurred us to see our visualization as a way to systematically illustrate these relationships between different attributes.