Dataset Summary

Map

Results

Summary

From our summary visualizations, we can see that people in United States tweeted the most during the weekend, with number of tweets fluctuating during weekdays but people tweeted the least on Thursday. This makes sense since most events occurred on weekends and people tweeted or posted on social media about the events right after the event over the weekend and some at the beginning of the week.

About 75% of the tweets have neutral sentiments, and positive tweets were found to be more prevalent than negative tweets, which is wonderful to see, especially when people sometimes say social media is becoming toxic. However, there is no specific pattern of tweets such as weekends being more positive, and we believe that the sentiment of the tweets depend more on the events happening instead of the days.

From our detailed analysis of state-by-state tweets, we found that California and New York are the states where most tweets come from. Florida is the third state and Texas is the fourth, where most twitter users are.

For most days, the average sentiment of the states are between neutral and positive. Only a few states have changes throughout those days, and Wyoming was the only one which has most changes such as it being the most neutral state all over the United States on April 12 and then becoming the most positive state on April 13. The most used hashtag on April 13 was '#jacksonhole' and we found that there was a special music event that played Mozart's first masterpiece on that day in Jackson Hole.

On 14th and 16th, we found the states to have more positive tweets compared to other days. On Friday, April 14th, we found that "#goodfriday" was one of the most used hashtags, and we can see why there were more positive tweets on April 14th. We also found more positive sentiment tweets on April 16th and the most popular topics for that day were '#happyeaster' and '#easter'.

Hypothesis Answers

Q1: How many people are tweeting in different states depending on day?
As we've speculated, people tweeted the most during the weekend all over the United States, with the highest numbers coming from California and New York, followed by Florida and then Texas.

Q2: What are the trending topics on twitter?
'#traffic' was the most popular topic during the weekdays, and disappeared over the weekend as people seemed to be more focused on other occasions, as we can see other hashtags such as '#easter'. We can also see the trend of people posting more '#photos' over the weekend, which are most likely to be the pictures taken at the special occasions over the weekend.

Q3: Are there sentiment value differences based on different states in the United States?
The sentiment values changes slightly across all states, but the sentiment values depend mainly on the events happening in the area.

Process Book


Overview and Motivation

Social networks have changed the way people communicate with each other. Today the world is lot more connected than it was before. People are also more open towards voicing their opinions regarding issues of their interests. These issues can be related to sports, careers, politics, art etc.

Social network websites like Twitter and Facebook provide great resources to learn about the preferences and interests of people in different demographics. Based on those interests and preferences, it’s easy to know about the sentiments and the location of the people. The data collected from these social networking websites is mostly in the form of raw text, but based on the context it’s easy to know the sentiments and opinions of people residing in different demographics. Our goal for the project is to gather, group and analyze such text data and provide meaningful insights in the form of visualizations.

For our project we have decided to work with twitter data which was collected for over a week. Tweets are complex and have many different attributes. It would be very interesting to visualize such data and discover any patterns in terms of how people tweet, where they tweet and when they tweet.

Related Work

We worked with a lot of numerical data during the class homework assignments, but never worked with text data. We felt that using the text data for our project would also give us the opportunity to think outside the box and come up with visualizations that draw meaningful insights on the underlying dataset. Also, since everyone uses social media, we felt that the text data gives us to have a larger target audience. Our main goal is to develop a tool which can be used by people of all age groups.

Visualizations are often used to understand the data on social media. For example, https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/. This website can give a sense of people’s opinions on current events. We also would like to create visualizations which will help data make sense for people of all age groups and connect the data with the events and provide a bigger picture for everyone.

Questions

1. How many people are tweeting in different states depending on day?
2. What are the trending topics on twitter?
3. Are there sentiment value differences based on different states in United States?

By using the visualizations for twitter data, we can learn a summary of situations happening throughout the country as well as how twitter users feel on these situations from the state-by-state sentiment analysis. Also, we can learn the times when twitter users most used and least used based on time of the day in different states.

These visualizations can highly benefit companies as they can learn the time most twitter users are active, things people are interested, etc., and answer the questions for whom they should advertise their products, when and where, etc. These can also give information to anyone who are interested in learning what’s happening on twitter in a minute.

Data

The twitter data comes from a project of the cse530s class. A script has been implemented to programmatically stream tweets from twitter streaming APIs. Approximately 1 million tweets have been downloaded for over a week. We are streaming more tweets from twitter and hope to discover more interesting results. To process the data, we implemented scripts to extract hashtags from the tweets. To find meaningful features, we also leveraged database and SQL queries to filter and aggregate the tweets.

Links: Twitter Streaming API: https://dev.twitter.com/streaming/overview NHGIS: https://www.nhgis.org/

Source: The twitter data comes from a project of the cse530s class. A script has been implemented to programmatically stream tweets from twitter streaming APIs. Approximately 1 million tweets have been downloaded for over a week. We plan to use the same script to stream the data for a longer period of time because we want to visualize its time dependence. We will also need county and state boundary data to do GIS visualization for the tweets. This data is also from cse530s class and it is downloaded from NHGIS.

Exploratory Data Analysis

Initialliy we decided to use D3 Datamap with bubbles, where the size of the bubble represents the number of tweets and the color of the states represent the sentiment for that state. We also decided to provide a timeline feature, where the user can filter the map based on the date. Later we replaced the date slider with a themeriver which displays how the tweets changes based on a particular hashtag. The benefits of this helps in identifying peaks in the number of tweets based on the particular hashtags. For ex: there is a peak in the themeriver for Easter day 2017-04-16.

The user can interact with the visualization based on several preferences like the map, the bubbles, the themeriver and the search box. We also added a histogram which displays the number of positive, negative and neutral tweets when hovered over a particular state. The map, themeriver, the word clouds and the bar plots are bound to each other. This gives the user more freedom to narrow down the visualization based on his/her preference.

Design Evolution

We have kept the same design layout as the one mentioned in the proposal. We were planning to implement a dashboard with 3 pages in total. The first page would provide the summary of the project, the second the map visualization and the third implementing the world clouds. Later we reduced the number of pages to 2 as we decided to include the map and the word cloud on the same page. We decided to remove it, because we felt having the map bound to the word cloud would help the user to interact with the map and the word cloud at the same time.

We designed to use a slider to present a timeline for the map, but we changed to a themeriver later as the slider does not provide much information. With themeriver, we present the amount of tweets for the most used hashtags over the week.

Implementation

We managed to implement all the features discussed in the first presentation. We did make a few changes along the way, but that was mainly to improve the user experience. We used D3, HTML, Javascript, CSS and Python to build our dashboard. For the Datamap the color of the state will represent the sentiment for that state and the size of the bubble will represent the number of tweets. When the user hovers over the bubble a histogram will be generated which displays the number of positive, negative and neutral tweets along with a word cloud for that state. The user also has the options to search for a particular hashtag in the search box. Our initial challenge was to reduce the response time required to generate the dynamic visualization. This helps the user to stay engaged with the dashboard. Later we also decided to implement a uniform color scheme for the whole dashboard which includes the summary page, the map, themeriver and histogram. We also added a results page and a process book to document our findings and learning experience from the project

Evaluation

From the data we were able to analyze the sentiments of people residing in different states. We found that people tend to tweet a lot during weekends. We managed to answer all our questions. We were able to get the number of tweets from each state and categorize the tweets into positive, neutral and negative. We were also able to identify the most trending topics based on the word clouds. Our visualization gives the user the freedom to interact with the map, word clouds, histogram and themeriver at the same time. The user can also customize his/her search preference using the search box. We could further improve our visualization by collecting more data for several months. This would help us to learn about the sentiment and nature of tweets for different seasons.

Contact


Wint Yee Hnin

Yongzheng Huang

Krushnaraj Kamtekar