The project title, your names, e-mail addresses, IDs, a link to the project repository.
Title: Visualizing UCLA graduate admissions.
Names: Jordan Chisam, Steven Harris, Zhengliang Liu
Emails: jachisam@wustl.edu, sharris22@wustl.edu, zhengliang@wustl.edu
IDs: 441729 (Jordan), 393270 (Steven), 465270 (Zhengliang)
Dev Team Repository: https://github.com/jachisam/cse457a-finalproject
Repository Provided by Professor: https://github.com/washuvis/admissions
Discuss your motivations and reasons for choosing this project, especially any background or research interests that may have influenced your decision.
College admission statistics, including those for graduate schools are always of great interest and importance for young people who are aspiring for getting better education. We have all gone through numerous applications. Our portfolios have all gone through numerous admission committees. Even though an applicant could evaluate her or his potentials for admissions, for the most part the admission process is a black box that provides little feedback. Finding out what factors lead to admission is certainly an important research topic with practical impacts. It would be very helpful if one could look at past admission data and make an assessment before investing the energy, time and money on an application for a particular school. Sadly, most schools never disclose such data. For those that do, the data might not be intuitive to be understood, More importantly, it is difficult to understand how exactly the measurements for each of the metrics factor into the final admission decision. Thus, we are very excited to start this project to visualize admission data with the aim of building an intuitive representation of the data. We found a dataset from UCLA on Kaggle and we think it is a good start, especially given that there is such a scarcity of admission data and even less visualization with such data.
Provide the primary questions you are trying to answer with your visualization. What would you like to learn and accomplish? List the benefits.
How to effectively visualize each factor’s contribution to the application process? With so many factors of consideration in the application process, it would be nice to be able to visualize each one of them clearly and effectively.
Benefits:
Benefits:
From where and how are you collecting your data? If appropriate, provide a link to your data sources. Source, scraping method, cleanup, etc.
We found our dataset on Kaggle. It is collected by a person named Mohan S Acharya, as part of his paper on a relevant topic [1].
Link: https://www.kaggle.com/mohansacharya/graduate-admissions.
[1] Mohan S Acharya, Asfia Armaan, Aneeta S Antony : A Comparison of Regression Models for Prediction of Graduate Admissions, IEEE International Conference on Computational Intelligence in Data Science 2019.
For our project we took the dataset and ran a multi variable regression in R and Python in order to extract the relevant factors for calculating applicants chance of admission. This can be found in the data section of the repository.
Do you expect to do substantial data cleanup? What quantities do you plan to derive from your data? How will data processing be implemented?
No, we do not expect to do substantial data cleanup. The dataset is very clean in the first place, nor is it particularly large. Data processing will likely be implemented with bash scripts or Python, or both.
How will you display your data? Provide some general ideas that you have for the visualization design. Create three alternative designs for your visualization. Create one final design that incorporates the best of your three designs. Describe your designs and justify your choices of visual encodings. You use the Five Design Sheet Methodology.
College admission statistics, including those for graduate schools are always of great interest and importance for young people who are aspiring for getting better education. We have all gone through numerous applications. Our portfolios have all gone through numerous admission committees. Even though an applicant could evaluate her or his potentials for admissions, for the most part the admission process is a black box that provides little feedback. Finding out what factors lead to admission is certainly an important research topic with practical impacts. It would be very helpful if one could look at past admission data and make an assessment before investing the energy, time and money on an application for a particular school. Sadly, most schools never disclose such data. For those that do, the data might not be intuitive to be understood, More importantly, it is difficult to understand how exactly the measurements for each of the metrics factor into the final admission decision. Thus, we are very excited to start this project to visualize admission data with the aim of building an intuitive representation of the data. We found a dataset from UCLA on Kaggle and we think it is a good start, especially given that there is such a scarcity of admission data and even less visualization with such data.
List the features without which you would consider your project to be a failure.
List the features which you consider to be nice to have, but not critical.
Make sure that you plan your work so that you can avoid a big rush right before the final project deadline, and delegate different modules and responsibilities among your team members. Write this in terms of weekly deadlines.
Weekly Deadlines (mondays)
Milestone 1: In Milestone 1, user selection had been implemented, but the water tank display was not.
Milestone 2: In Milestone 2, the user selection is modified, the tables are reconstructed, and water tank is implemented.
Description: The left hand side is a panel where users could select an applicant's data to view. Every applicant, numbered by a serial number, represented by a circle. The color of the circle indicates the applicant's likelihood of being admitted to UCLA's graduate program. When a circle gets selected, it will change to a gold color and it will become larger. All the other circles that fall in the same class of likelihood are also highlighted. After that, users will ber able to see their selection's data, namely the values for each admission factor to populate the table. If the user is satisfied with the selection, the user could click the update button and the animation eill start in the panel to the right of the selection panel. Each factor , e.g. GRE, will be put on a balance, represented by a black ball proportional to its numeric value and the factor's weight in the admission process.It will be compared against the "standard" for that factor, which is calculated by taking the average of that factor among the applicants who are classified as being "highly likely" to be admitted. The animation will go through each factor and either pump water in or drain water out. For example, if the applicant's GRE score is higher than the "standard" GRE, water will be pumped.Otherwise, water will be drained. The amount of water being pumped or drained is proportional to the percentage difference and the factor's weight in admission. As you can see, GRE score is actually not so important in determining admission in our data. Eventually, if the water level in the tank is above the red line, it indicates that the applicant is highly likely to be admitted.
In the two days between User Testing and Milestone 2, we completely restructured the project due to feedback on the site's intuitiveness.
Here are some screenshots of the updated visualization that we used for user tesing:
Our User Tasks:
Most of the feedback we got stated:
After the in class presentation we recieved some feedback and made the changes as requested.
Feedback:
Takeaways: We fixed a lot of the minor bugs/recommendations made by our peers. This includes numbers: 4-11. This smaller set of tasks was addresesd after the presentation. However, what took the most time was working on the additional two visualizations that we decided to include due to feedback numbers 1 and 5. To be completely transparent, we did have a completely different project scope in the beginning that visualized the chance of admission, which took quite a long time to implement. However, unfortunately with our timeline and current level of skill we were not able to implement it in a completely intuitive way. The old project effort can be found here:
Here are some screenshots of final changes to the project:
Anything that inspired you, such as a paper, a website, visualizations we discussed in class, etc.
There was not really any prior visualization or website that inspired our work. For this project we started by looking on Kaggle for interesting datasets, and when we stumbled upon the UCLA admissions dataset we just started brainstorming the best ways to utilize and visualize the dataset.
What questions are you trying to answer? How did these questions evolve throughout the project? What new questions did you consider in the course of your analysis?
We are most interested in giving applicants a better idea of seeing what kind of applicant the university has recieved as well as accepted. We hope that people interested in applying to UCLA graduate school will be able to come to this website and get a better idea of their likeliness of getting into the university and seeing what they are up against. In the course of our analysis we became interested in seeing which statistics had largest affect on chance of admission. From this idea, we then modified our visualization to accept user input and calculate their own respective chances of admission.
What visualizations did you use to look at your data initially? What insights did you gain? How did these insights inform your design?
If you look under the Milestone 2 tab on the process book, you can see the intial effort at creating a water tank visualization with a balance to show the most important factors in determining chance of admission. We found that the implementation was not very intuitive, so we had to make a lot of changes to simplify it in order to increase user understanding. A lot of time and code was put into building the intial watertank code and it did set us back a bit of time when it came to restructuring the entire visualization.
What are the different visualizations you considered? Justify the design decisions you made using the perceptual and design principles you learned in the course. Did you deviate from your proposal?
Images detailing our visualizations growth is depicted in the upper part of the proces book. Most of the design choices made throughout the project was in favor of making the viz easy to understand and use for users. From the course we utilized our knowledge of d3, colorbliness awarenes, style layout, project flow and user journey, user testing practices and protocols, and overall ink to page visualization techniques. Beyond the major overhaul in between Milestone 2 and user testing, there we not many deviations in our proposal. Our overall goal never changed, however the way we accomplished it did. These changes are detailed under the Milestone 1 and 2, User Testing, and Post Presentation sections.
Describe the intent and functionality of the interactive visualizations you implemented. Provide clear and well-referenced images showing the critical design and interaction elements.
What did you learn about the data by using your visualizations? How did you answer your questions? How well does your visualization work, and how could you further improve it?
Using our visualization we learned about what factors are important to UCLAs graduate admissions office (i.e. CGPA and GRE - though this idea may not stem to other graduate instituites). We think our visualization works very well and is clear at communicating chance of admission to prospecitive applicants. For further improvement we would REALLY like to get our hands on other college datasets and create a more global product for displaying chances at getting into any graduate school.
The sources we used are also linked in our project README.md.
Many thanks/credit to the various sources we used in our project.
After the in class presentation we recieved some feedback and made the changes as requested.
This can also be found linked in the project README.md.
I use a “balance” to weigh a factor (e.g., gpa = 3.74) against a pre-calculated “sufficient level” (e.g., “3.8”). The size of the ball represents the value of the factor (such as “3.74”). If the balance favors that ID (in this case 37) 's GPA,it drags the bar upwards toward the line marked as "admitted", otherwise it loosens it down to get closer to the line marked as "reject". Gradually, as we go through every factor (GPA, TOEFL, GRE, recommendation letter, etc.) it moves toward either rejection or admission. The strings between the balance and the purple bar acts as strings of control. The color of the objects are for pop-out effect only with no special meanings, for now. It is simple and intuitive but the largest problem is that. The key is, with this design it is harder to differentiate the different importance of two different factors (e.g., GPA v.s. GRE score) because every factor shares that “sufficient” marker and it is hard to differentiate the effects visually.
CloseThis is a very rough sketch of a basic single page visualization that allows users to look at various graduate school applicant profiles. Each profile will show the variables that are taken into account during the graduate school admissions process. As the user clicks on a circle (representing an applicant profile), the page will move down and display a scatter plot with the selected user profile (highlighted and larger) compared with other user profiles (non-highlighted and smaller--to avoid clutter). On hover, the circles (profiles) will enlarge and show their respective variables.
The downside of this visualization is that it is not clear how certain variables affect chance of admission. Additionally, it may be a bit monotonous for the user to go back and forth from the top of the page to the bottom. Our final design should take these faults into account and make it clear which variables have stronger influence, in addition to making user interaction easy rather than tedious.
CloseAnother alternative design was to use a bubble graph visualization. This would depicted all of the probabilities of admission in 10% ranges. The lower part of the visualization would be used for clusters of low chances of admission, while the higher part of the visualization would be used for higher chances of admission. The size of the bubbles would demonstrate the portion of applicants that fall into that range. Each bubble would contain various applicants that fell into the chance of acceptance rate represented. On hover (or click), the variables of (3) Another alternative design was to use a bubble graph visualization. This would depicted all of the probabilities of admission in 10% ranges. The lower part of the visualization would be used for clusters of low chances of admission, while the higher part of the visualization would be used for higher chances of admission. The size of the bubbles would demonstrate the portion of applicants that fall into that range. Each bubble would contain various applicants that fell into the chance of acceptance rate represented. On hover (or click), the variables of each application would be displayed in a tooltip. Ideally, this visualization would use color to separate the circles. Similar to Design Sketch 2, this visualization lacks the ability to properly demonstrate how each variable affects chance of admission.
CloseThe final design is an improvement of the first design with the “balance” and the “hanging bar”. It uses a water tank instead. In the above example with GPA, if the balance favors ID 37’s GPA against a pre-calculated standard/”sufficient” level, it opens the gate of the water pump and let water in. Otherwise, it opens the gate of the water drain and drains some water out. Gradually, as we go through every factor (GPA, TOEFL, GRE, ,recommendation letter, etc.) the water level goes up and down and eventually if it goes beyond the “admitted marker” this ID is admitted, otherwise rejected. The main improvement from the “hanging bar” design is that with water tank you can fill different amount of water to compare the importance of different factors (GPA vs GRE, the more important the more water coming in) but with the “hanging bar” it is harder to do that. Also, potentially, as the water tank design is inherently slightly more complex it can later convey more information from the each run’s water feature (color, “wave”) for more information. It is more expandable and flexible.
Close