Name | ID | |
---|---|---|
Elijah Pena | elijah.pena@wustl.edu | 455281 |
Alex Eaton | a.eaton@wustl.edu | 456944 |
Yile Che | yileche@wustl.edu | 473924 |
Covid-19 has devastated the world, but not all places and people have been affected equally. With this project we plan to analyze how different demographics and locations have been hit harder by this virus. This website can also be used as a source to show the effects over time of implementing PPE and quarantine requirements. Using this visualization, various governing bodies could decide whether or not implementing these mandates would be best for their community.
For milestone 1, we have finished the basic functionality of our world heatmap over time. We have a sliding time bar that goes from dates Jan 1st, 2020 and Netherlands at November 18th, 2020. As you scroll through the time, you can see a heatmap of coronavirus cases in each country. Our color scale goes from white to red, but we need to improve it a little more since the colors can be somewhat unclear. For example, with our current colorscale most countries appear completely white even though they may have a sizeable number of cases. Also, countries that are colored black mean that there is no current data for that country at the given date. We have also narrowed down the datasets we are using. We have decided not to make a visualization focused solely on the United States, so we've crossed out the U.S. Covid-19 dataset we had found. In addition, we've decided not to use the demographic dataset due to it's extremely large size causing performance issues, and also because we aren't only focusing on the U.S.
We have a working prototype! There are quite a few things that need fixing before it can be called complete though. At the moment there is no instructions telling the user how to use the visualization. We have some ideas of having a modal popup with the visualization grayed out behind. This will also help hide the fact that the visualization takes quite a while to load. We also have to improve runtime by some small optimizations in code but no matter what this visualization will be a little slow because of the size of the data. There is currently no legend on the heatmap. The way our heatmap is currently set up does not allow for an easy addition of a legend, so version 2 will definitely include one. The line chart also doesn't currently have a title, meaning that no one knows what metrics they are looking at. The mask mandate data is also formatted weirdly so we are going to have to go back and fix the names of countries with spaces in them so we can get the right data.
The primary thing we did for this final update was get our US demographics page working. This page took us longer than usual due to the size of the dataset that we used. The demographic dataset is from the CDC and included a row for EVERY deidentified COVID patient in the US and was updated regularly. The issue was that the dataset contains about 8 million rows and was much too large to be hostable on github since github pages has a file size limit of 100mb. This meant that we had to shorten the dataset to about 7/8ths of its original size which was still 900,000 rows. To do this we created a script called randomdata.py which selected 900,000 rows randomly from the data, since choosing randomly was the best way to ensure that the data we chose wasn't biased, and would therefore hopefully represent the statistics of the full dataset fairly accurately. Obviously this wouldn't have been an issue if github didn't limit the file size so much. Another issue with this dataset is that the races/ethnicities of most patients weren't recorded. The largest ethnicity by far is "unknown", and "missing" also takes up a significant portion of cases. Unfortunately this was the best dataset we found for race, gender, and age information and we decided it was probably the most accurate since it's regularly updated by the CDC. We also had to limit the demographic info to the US since there were no worldwide demographic datasets that we found. Our 2 minute screencast for this project is linked below.
For the screencast, please click HERE