diff --git a/images/springbreak/fullspread.PNG b/images/springbreak/fullspread.PNG new file mode 100644 index 0000000..fed7a62 Binary files /dev/null and b/images/springbreak/fullspread.PNG differ diff --git a/images/springbreak/onebeach.PNG b/images/springbreak/onebeach.PNG new file mode 100644 index 0000000..a649c70 Binary files /dev/null and b/images/springbreak/onebeach.PNG differ diff --git a/images/week1/LondonLibraries.PNG b/images/week1/LondonLibraries.PNG new file mode 100644 index 0000000..cf142cd Binary files /dev/null and b/images/week1/LondonLibraries.PNG differ diff --git a/images/week1/SeoulLibraries.PNG b/images/week1/SeoulLibraries.PNG new file mode 100644 index 0000000..4ca085e Binary files /dev/null and b/images/week1/SeoulLibraries.PNG differ diff --git a/images/week1/baseball.png b/images/week1/baseball.png new file mode 100644 index 0000000..be96229 Binary files /dev/null and b/images/week1/baseball.png differ diff --git a/images/week1/faces_of_cattle.PNG b/images/week1/faces_of_cattle.PNG new file mode 100644 index 0000000..c82d52b Binary files /dev/null and b/images/week1/faces_of_cattle.PNG differ diff --git a/images/week1/violent_crime_in_the_US.gif b/images/week1/violent_crime_in_the_US.gif new file mode 100644 index 0000000..c2ea79e Binary files /dev/null and b/images/week1/violent_crime_in_the_US.gif differ diff --git a/images/week10/abstract.PNG b/images/week10/abstract.PNG new file mode 100644 index 0000000..6b72fb9 Binary files /dev/null and b/images/week10/abstract.PNG differ diff --git a/images/week10/big.PNG b/images/week10/big.PNG new file mode 100644 index 0000000..ecf4b5d Binary files /dev/null and b/images/week10/big.PNG differ diff --git a/images/week10/new.PNG b/images/week10/new.PNG new file mode 100644 index 0000000..538ad47 Binary files /dev/null and b/images/week10/new.PNG differ diff --git a/images/week10/objectiveresults.PNG b/images/week10/objectiveresults.PNG new file mode 100644 index 0000000..55bc106 Binary files /dev/null and b/images/week10/objectiveresults.PNG differ diff --git a/images/week10/subjectiveresults.PNG b/images/week10/subjectiveresults.PNG new file mode 100644 index 0000000..ad8da1b Binary files /dev/null and b/images/week10/subjectiveresults.PNG differ diff --git a/images/week10/traditional.PNG b/images/week10/traditional.PNG new file mode 100644 index 0000000..bf8a229 Binary files /dev/null and b/images/week10/traditional.PNG differ diff --git a/images/week11/abstract.PNG b/images/week11/abstract.PNG new file mode 100644 index 0000000..af03cde Binary files /dev/null and b/images/week11/abstract.PNG differ diff --git a/images/week11/faces.PNG b/images/week11/faces.PNG new file mode 100644 index 0000000..09e9e2d Binary files /dev/null and b/images/week11/faces.PNG differ diff --git a/images/week11/points.PNG b/images/week11/points.PNG new file mode 100644 index 0000000..d7a0b5a Binary files /dev/null and b/images/week11/points.PNG differ diff --git a/images/week11/process.PNG b/images/week11/process.PNG new file mode 100644 index 0000000..d6a647e Binary files /dev/null and b/images/week11/process.PNG differ diff --git a/images/week12/abstract.PNG b/images/week12/abstract.PNG new file mode 100644 index 0000000..8288fee Binary files /dev/null and b/images/week12/abstract.PNG differ diff --git a/images/week12/encodings.PNG b/images/week12/encodings.PNG new file mode 100644 index 0000000..239950d Binary files /dev/null and b/images/week12/encodings.PNG differ diff --git a/images/week12/logrrp.PNG b/images/week12/logrrp.PNG new file mode 100644 index 0000000..20ce263 Binary files /dev/null and b/images/week12/logrrp.PNG differ diff --git a/images/week12/scenario.PNG b/images/week12/scenario.PNG new file mode 100644 index 0000000..668d482 Binary files /dev/null and b/images/week12/scenario.PNG differ diff --git a/images/week13/abstract.PNG b/images/week13/abstract.PNG new file mode 100644 index 0000000..2828677 Binary files /dev/null and b/images/week13/abstract.PNG differ diff --git a/images/week13/experiments.PNG b/images/week13/experiments.PNG new file mode 100644 index 0000000..ff04b37 Binary files /dev/null and b/images/week13/experiments.PNG differ diff --git a/images/week14/abstract.PNG b/images/week14/abstract.PNG new file mode 100644 index 0000000..3760130 Binary files /dev/null and b/images/week14/abstract.PNG differ diff --git a/images/week14/after.PNG b/images/week14/after.PNG new file mode 100644 index 0000000..8cb9dae Binary files /dev/null and b/images/week14/after.PNG differ diff --git a/images/week14/before.PNG b/images/week14/before.PNG new file mode 100644 index 0000000..8734652 Binary files /dev/null and b/images/week14/before.PNG differ diff --git a/images/week14/figure6.PNG b/images/week14/figure6.PNG new file mode 100644 index 0000000..b9b09ca Binary files /dev/null and b/images/week14/figure6.PNG differ diff --git a/images/week14/figure7.PNG b/images/week14/figure7.PNG new file mode 100644 index 0000000..397595a Binary files /dev/null and b/images/week14/figure7.PNG differ diff --git a/images/week14/figure8.PNG b/images/week14/figure8.PNG new file mode 100644 index 0000000..8018c78 Binary files /dev/null and b/images/week14/figure8.PNG differ diff --git a/images/week14/proofofconcept.PNG b/images/week14/proofofconcept.PNG new file mode 100644 index 0000000..44468e6 Binary files /dev/null and b/images/week14/proofofconcept.PNG differ diff --git a/images/week2/dailydoses.PNG b/images/week2/dailydoses.PNG new file mode 100644 index 0000000..32ae272 Binary files /dev/null and b/images/week2/dailydoses.PNG differ diff --git a/images/week2/percentvacinnatedbystate.PNG b/images/week2/percentvacinnatedbystate.PNG new file mode 100644 index 0000000..04bccfc Binary files /dev/null and b/images/week2/percentvacinnatedbystate.PNG differ diff --git a/images/week2/whenwillwebevaccinated.PNG b/images/week2/whenwillwebevaccinated.PNG new file mode 100644 index 0000000..19caed4 Binary files /dev/null and b/images/week2/whenwillwebevaccinated.PNG differ diff --git a/images/week3/DisneyProducts.jpg b/images/week3/DisneyProducts.jpg new file mode 100644 index 0000000..57bb594 Binary files /dev/null and b/images/week3/DisneyProducts.jpg differ diff --git a/images/week4/birds.PNG b/images/week4/birds.PNG new file mode 100644 index 0000000..62283d4 Binary files /dev/null and b/images/week4/birds.PNG differ diff --git a/images/week4/firerisk.PNG b/images/week4/firerisk.PNG new file mode 100644 index 0000000..1a221be Binary files /dev/null and b/images/week4/firerisk.PNG differ diff --git a/images/week4/humanimpact.PNG b/images/week4/humanimpact.PNG new file mode 100644 index 0000000..b715061 Binary files /dev/null and b/images/week4/humanimpact.PNG differ diff --git a/images/week4/unsurveyed.PNG b/images/week4/unsurveyed.PNG new file mode 100644 index 0000000..686c1b8 Binary files /dev/null and b/images/week4/unsurveyed.PNG differ diff --git a/images/week4/wildlifedensity.PNG b/images/week4/wildlifedensity.PNG new file mode 100644 index 0000000..4605a34 Binary files /dev/null and b/images/week4/wildlifedensity.PNG differ diff --git a/images/week4/year1.PNG b/images/week4/year1.PNG new file mode 100644 index 0000000..dcacb9d Binary files /dev/null and b/images/week4/year1.PNG differ diff --git a/images/week4/year5.PNG b/images/week4/year5.PNG new file mode 100644 index 0000000..2ab7880 Binary files /dev/null and b/images/week4/year5.PNG differ diff --git a/images/week5/parallel_axes.gif b/images/week5/parallel_axes.gif new file mode 100644 index 0000000..470ccd5 Binary files /dev/null and b/images/week5/parallel_axes.gif differ diff --git a/images/week6/timesearcher2.PNG b/images/week6/timesearcher2.PNG new file mode 100644 index 0000000..94ee99e Binary files /dev/null and b/images/week6/timesearcher2.PNG differ diff --git a/images/week6/video.PNG b/images/week6/video.PNG new file mode 100644 index 0000000..e40aaa1 Binary files /dev/null and b/images/week6/video.PNG differ diff --git a/images/week8/bitmaps.PNG b/images/week8/bitmaps.PNG new file mode 100644 index 0000000..42e7bdd Binary files /dev/null and b/images/week8/bitmaps.PNG differ diff --git a/images/week8/grids.PNG b/images/week8/grids.PNG new file mode 100644 index 0000000..bbeb36a Binary files /dev/null and b/images/week8/grids.PNG differ diff --git a/images/week8/timeseries.PNG b/images/week8/timeseries.PNG new file mode 100644 index 0000000..4c1b55b Binary files /dev/null and b/images/week8/timeseries.PNG differ diff --git a/images/week9/itunes_visualizer.gif b/images/week9/itunes_visualizer.gif new file mode 100644 index 0000000..f26c940 Binary files /dev/null and b/images/week9/itunes_visualizer.gif differ diff --git a/images/week9/mam_partmotion.gif b/images/week9/mam_partmotion.gif new file mode 100644 index 0000000..d33c19c Binary files /dev/null and b/images/week9/mam_partmotion.gif differ diff --git a/images/week9/mam_pianoroll.gif b/images/week9/mam_pianoroll.gif new file mode 100644 index 0000000..129974a Binary files /dev/null and b/images/week9/mam_pianoroll.gif differ diff --git a/images/week9/mam_tonalcompass.gif b/images/week9/mam_tonalcompass.gif new file mode 100644 index 0000000..0cfb863 Binary files /dev/null and b/images/week9/mam_tonalcompass.gif differ diff --git a/images/week9/mpm.gif b/images/week9/mpm.gif new file mode 100644 index 0000000..ac58ce8 Binary files /dev/null and b/images/week9/mpm.gif differ diff --git a/springbreak.md b/springbreak.md new file mode 100644 index 0000000..cc23b3c --- /dev/null +++ b/springbreak.md @@ -0,0 +1,25 @@ +Spring Break - Springbreak Phone Data to Show the Potential Spread of COVID-19 +=== +By Andrew Nolan (3-22-21) + +![The Possible Spread of COVID from Fort Lauderdale Beach](./images/springbreak/fullspread.PNG) + +I was not sure if we needed to submit a reflection this week since it is spring break, but I thought it would be fun to share a cool spring break visualization since it's currently WPI's spring break and starting next week all reflections will be on academic papers. + +This article was actually published last year, and it was a bit ahead of it's time. Now we are aware of how dangerous the spread of COVID-19 can be, but last year everything was new. A group of data visualization researchers at Tectonix Geo created a model showing how just the people gathered at one beach in Florida during spring break could spread COVID-19 across the country. They used cell phone data aggregated by the location data company X-Mode Social. This data showed 5000 phones at a beach in Fort Lauderdale Florida during peak Spring break time in March, 2020. Then it zooms out the timeline to see where the phones were the week before and after spring break. This reveals the potential spread that COVID could have from just one beach of spring breakers. + +Evidently (and unfortunately), the warnings from Tectonix Geo did not get heard because COVID spread a lot and people are still travelling for spring break this year. But as a data visualization it is very effective at showing how spread can occur. It works as a network showing connections of where the phones travel across the U.S. It ties in niceley with our recent reading of chapter 8 in the text book, since this is a very clear example of arranging spatial data. + +Here you can see the collection of 5000 phones without social distance at the beach: + +![The Possible Spread of COVID from Fort Lauderdale Beach](./images/springbreak/onebeach.PNG) + +Now you can see the paths these phones have travelled in the dates surrounding the week of spring break: + +![The Possible Spread of COVID from Fort Lauderdale Beach](./images/springbreak/fullspread.PNG) + + +Sources +--- +1. Thousands of spring breakers traveled from one Florida beach to cities across the US. Mapping their phone data shows the importance of social distancing amid the coronavirus outbreak: https://www.businessinsider.com/coronavirus-florida-spring-break-location-data-spread-social-distancing-2020-3 +2. Tectonix's Tweet: https://twitter.com/TectonixGEO/status/1242628347034767361?s=20 \ No newline at end of file diff --git a/week1.md b/week1.md index e69de29..42a1ace 100644 --- a/week1.md +++ b/week1.md @@ -0,0 +1,86 @@ +Week 1 - A Discussion of Chernoff Faces +=== +By Andrew Nolan (2-8-21) + +#### Intro +Last summer I took the course CS 548 Knowledge Discovery and Data Mining. In one of the lectures we talked about data visualizations and in one slide we breifly discussed Chernoff Faces. For some reason I think about this data visualization way more than I should. For my assignment 1 project I made some Chernoff Face visualizations of Iris species data in d3. Since I've been thinking about them all week I felt like an introduction/discussion to the applications of the visualization would be a good choice for this week's reflection. + +#### Background + +For those unfamiliar with the data vis, Chernoff Faces were invented in 1973 by the American Mathematician, Herman Chernoff. Chernoff Faces provide a way to visualize multivariate data by mapping data features to visual features on the face [1][11]. The benefits of this visualization come from the claim that humans can easily recognize faces and identify small changes between them. This is useful if the goal of the visualization is pattern recognition, outlier detection, or clustering similar data objects. However, the scale and value of data is often obscured by this vis. Without intimate knowledge of which values are mapped to what facial feature it is hard to decipher more than which objects are similar or outliers. + +From what I can tell, in most languages you would have to fork someone's Github repo or build your own system for visualizing data with Chernoff Faces. But if you are a fan of R, there is a package called aplpack that includes easy to use Chernoff Face code. This library is commonly used in real world Chernoff Faces. You can see actually see this library in use in all of the examples below. You can learn how to use it in the article "How to Visualize Data with Cartoonish Faces ala Chernoff" [6]. + + +#### Real World Examples + +Considering that most people have probably not heard of this visualization, it's unsurprising that it is rarely used for real world applications. However, I wanted to share a few practical examples that could be found. + +In 2006 a Social Science Statistics blogger shared a Baseball bloggers Chernoff face represenation of 2005 National League Baseball statistics [4][5]. The faces below map the following features to facial features: +- Win percent -> face height, smile curve, and hair styling +- Hits -> face width, eye height, and nose height +- Home runs -> face shape, eye width, and nose width +- Walks -> mouth height, hair height, and ear width +- Stolen bases -> mouth width, hair width, and ear height. + +![2005 National League Baseball Statistics](./images/week1/baseball.png) + +While these statistics may be meaningful, it's hard to determine anything from these faces. In 2005 the winner of the National League was the Houston Astros (who then went on to lose to the White Sox in the World Series). In these faces, Houston doesn't particularly stand out. I'm not knowledgeable enough about baseball to tell you if that means anything, but it appears to me that would mean stand out stats don't neccisarily determine winningness in baseball. + +Most Chernoff Face discussion will be from blog posts like the one above, mostly as a for fun look at an obscure data visualization. But there are cases in which Chernoff Faces are used in publishable research. For example, in 2017, the Journal of Documentation published an entry titled "Big data analysis of public library operations and services by using the Chernoff face method" [12]. By using the following feature mapping, the researchers were able to use Chernoff Faces to compare libraries in London and Seoul over time. + +- Height of hair -> issues +- Width of hair -> visits +- Eye size -> collections +- Ear size -> number of libraries +- Nose size -> budgets +- Mouth size -> number of staff +- Mouth curvature -> number of professional staff +- Face size -> library floor space + + +![London Library Data visualized with Chernoff Faces](./images/week1/LondonLibraries.PNG) + +![Seoul Library Data visualized with Chernoff Faces](./images/week1/SeoulLibraries.PNG) + +The results of this study were used to compare the libaries. For example, they found London libraries typically had larger budgets than Seoul libraries. And in the time difference they measured (2004-2014) they discovered libaries were shrinking. Specifically related to the data vis, the study determined Chernoff Faces are useful for identifying patterns between libraries. And if a baseline face is set, it can be useful for measuring performance changes overtime. This is actually a very interesting point that is not often seen in Chernoff Faces. The researchers also point out 3 key limitations of Chernoff Faces. They propose that using more than 10 variables in a face will cause result in hard to notice changes. If there are too many faces it can also become hard to remember differences. Finally it is difficult to use to compare datasets that are not normalized. In their case comparing London and Seoul were hard as the different cities reported different data metrics. + +Another recently published research example of Chernoff Faces I found comes from cattle research. In 2020, researchers from Akdeniz University in Turkey published a paper titled "Chernoff faces application in livestock" [10]. The paper ananlyzes the theoretical and practical applications of using Chernoff faces to visualize live stock data of cattle and goats. "Easily understood presentations facilitated by figures were obtained." Unfortunately, everything in the paper besides the abstract and image captions are in Turkish, so I don't really know what these faces are saying but we can take their word for it. + +![Faces of Cattle](./images/week1/faces_of_cattle.PNG) + +To summarize, there are many pratical applications of Chernoff Faces. Although they are rare, there are even more than the ones I've shown here. They are used in all sorts of industries and even used for big data. The primary benefit of this data vis is detecting patterns and outliers in the data. + +#### Challenges and Limitations + +So far we've mostly looked at successful implementations of Chernoff Faces. But as mentioned in the library paper, Chernoff Faces do have drawbacks. They do not work well with more than 10 variables and if there are too many faces it can become overwhelming. Furthermore, while they are useful for identifying patterns and outliers, they are not as useful for direct comparisons or measuring of values. + +In 2007, Robert Kosara, a researcher for Tableau, wrote a blog post describing the drawbacks of Chernoff Faces [2]. His primary criticism serves as a direct counter to the alleged benefit of Chernoff Faces. Chernoff claims faces are a good visualization tool because humans are evolved to recognize faces. Kosara cites papers arguing this is true for faces as a whole and not for individual components. Arguing, "Face perception works in a holistic and hierarchical way. We do not see a nose, ears, eyes, eyebrows, etc., and then piece them together (at least not consciously). Rather, we recognize a person". Arguably, this still allows Chernoff Faces to be a valuable tool for pattern and outlier detection, but further supports the idea that they are not as good for identifying and comparing specific values. + +Expanding on one of the drawbacks discussed in the library paper, Kosara (and the wikipedia article) mentions the importance and limitations of using facial features. In the library article they mention how it is difficult to represent data with more than 10 attributes with a Chernoff Face. Kosara adds onto this mentioning that since certain facial features change more and are easier to recognize, data mapped to these features is percieved as more important. Thus, great care must be taken by data scientists using this visualization to map features appropriately and effectively. + +One last story I thought was interesting and worth discussing in this reflection is about the social implications of Chernoff Faces. Isabella Chua, a data story teller for the Kontinentalist, wrote a blog post discussing the possible accidentally racist nature of some Chernoff Faces [9]. There is a tutorial for how to use Chernoff Faces in R, the same tutorial I mentioned earlier [6]. This tutorial uses United States' crime statistics as its example dataset. The resulting Chernoff Faces can be seen in an image below. + +![Face of Crime in the US](./images/week1/violent_crime_in_the_US.gif) + +At first glance this seems like standard Chernoff Faces. But as Chua discovered from reading the comments in the tutorial, some unintentional effects occured. Just due to how the R algorithm for Chernoff Faces worked, it mapped high violent crime rates to certain facial features stereotypical of black people. For obvious reasons, this is not a good thing. Chua went on to review her own Chernoff Faces to see if anything similar occured. She discovered that in her own work with Chernoff Faces when representing data from multiple countries, the way the features were chosen had lead to the Chernoff Faces of asian countries having slanted eyes. Another offensive stereotype. She admits in her blog post that she may be looking into it too much. But I believe she raises a good point, with a data vis like Chernoff Faces, or really any data visualization, the way you choose to represent the data tells an important story. We want to convey what the data is telling us and avoid any bias or potentially unintentional offensiveness that the data may create. + +#### Concluding Sentence + +Sorry for the long reflection this week. I just got really interested in Chernoff Faces and wanted to do a deep dive. I hope this was interesting to anyone who took the time to read it :) + +Sources/Further Reading +--- +1. Wikipedia https://en.wikipedia.org/wiki/Chernoff_face +2. A Critique of Chernoff Faces https://eagereyes.org/criticism/chernoff-faces +3. Mapping Quality of Life with Chernoff Faces https://web.archive.org/web/20041217153643/http://gis.esri.com/library/userconf/educ04/papers/pap5000.pdf +4. Chernoff Faces (Baseball) https://web.archive.org/web/20130916002111/http://blogs.iq.harvard.edu/sss/archives/2006/11/chernoff_faces_1.shtml +5. What's the Matter With Chernoff Faces? https://web.archive.org/web/20130128144805/http://alexreisner.com/baseball/stats/chernoff +6. How to visualize data with cartoonish faces ala Chernoff https://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/ +7. Chernoff Faces (Crime) https://ldld.samizdat.cc/2016/chernoff-faces/ +8. Deep Chernoff Faces https://www.ihatethefuture.com/2020/06/deep-chernoff-faces.html +9. How can a data visualization be racist? https://medium.com/kontinentalist/how-can-a-data-visualisation-be-racist-a652910d8184 +10. Chernoff Faces Application in Livestock https://www.cabdirect.org/cabdirect/abstract/20203479862 +11. The Use of Faces to Represent +Points in k-Dimensional Space Graphically https://www.jstor.org/stable/2284077 +12. Big Data Analysis of Public Library Operations and Services by using the Chernoff Face Method https://www.emerald.com/insight/content/doi/10.1108/JD-08-2016-0098/full/html \ No newline at end of file diff --git a/week10.md b/week10.md new file mode 100644 index 0000000..62b30d2 --- /dev/null +++ b/week10.md @@ -0,0 +1,33 @@ +Week 10 - Chess Evolution Visualization +=== +By Andrew Nolan (4-12-21) + +![The abstract of the paper](./images/week10/abstract.PNG) + +I like chess. For a while I was President of WPI's chess club. I was having trouble finding a data visualization paper that interested me this week, so I thought I would look at IEEE VIS and see if there were any papers on non-technical topics that I found fun. I searched for chess and this showed up! So here we are. + +A standard chess position analysis/visualization tool will look something like the figure shown below. In this tool pieces you can capture appear highlighted in green, your pieces under attack are highlighted in red, and pieces partially under attack are highlighted in yellow. Arrows on the board show the recommended moves. In addition, most modern chess visualizations will also highlight the previous move. However, chess is not a one move game, the situation changes over time. Traditionally these columns on the left represent the recommended series of moves and expected opponent responses for a given position. This visualization below appears to show 5 sequences of moves and highlights two of them on the board. I've been using tools like these for over a decade now, so I am very comfortable and familiar with how to use them. However, these tools do not visualize the temporally evolving nature of a chess game. And the chess algebraic notation is hard to read for an untrained/novice user. + +![The abstract of the paper](./images/week10/traditional.PNG) + +This paper proposes a new chess visualization to effectively convey the changes in a game over many moves. It is a multi-part visualization tool containing a score chart, evolution graph, and chess boards that can provide local move based and global overall game analysis of a chess match. An example of this tool can be seen in the following two figures. They use a modified version of a tree/directed graph to depict the game. The network is a tree of nodes and edges representing the moves and possible outcomes. It relies on the Stockfish engine with a search of depth 20 moves to calculate possible positions, this is typical of most chess visualizers. (Stockfish the most powerful chess computer that is not driven solely by AI, it did lose to Alpha Zero). It stores all moves and shows what it considers key points. To represent the evolution of a chess game, the researchers want to show potential positions after multiple moves and depict key events "such as draws, effective checks, and checkmates". The visualization, like the traditional chess tools, only show moves that improve the position, it does not visualize potential outcomes that obviously detriment the player. + +The circles in the visualization represent actual moves, the squares represent moves calculated by Stockfish. To simplify the graph moves with only one or two direct responses were not shown, and instead the edge in the graph shows a "Several moves" arrows. Additionally, positions that repeat were merged into one node. The network is read from left to right starting at move 1 for white. The thickness of the arrows on the edges represents the relative gained advantage from each move. In the second image you can see the score chart at the bottom it represents the relative advantage of each player at a given move. + +![The abstract of the paper](./images/week10/new.PNG) + +![The abstract of the paper](./images/week10/big.PNG) + +The researchers performed analysis using their tool on several famous games and reached similar conclusions to professional chess commentators and players. They claim their tool is effective for analyzing the evolution of a chess game over time. Especially since it makes it easier for users to jump between different possible branches to see the outcome. In the traditional chess visualizers/analyzers you cannot easily jump between branches as things play out sequentially and analysis is only shown for the current position. + +They also did a comparative user study between their system and a traditional chess system called Arena. The results shown in the figures below show that it took less time to answer questions about the chess games using their system and the users also had a higher correct answer rate. The user study had 21 participants, two-thirds identified as novice players. Participants not only were more effective using this system, but they also responded more positively to its design. + +![The abstract of the paper](./images/week10/objectiveresults.PNG) + +![The abstract of the paper](./images/week10/subjectiveresults.PNG) + +In my personal opinion, as a chess player, I am not sure I would use this tool. Maybe as they show, it is effective for seeing overall patterns in a game. But from my experience I would argue the main use of these tools is to understand the specific positions were something goes wrong and could have been better The overall game is often decided by these few decisions. Looking at this large network could identify these mistakes, but it also would have lots of superfluous info. Additionally, and the researchers do mention this in their limitations and future work, this system does not actually show the chess positions. Users would need another tool or real life chess board to see were the pieces are. This tool just shows the relative advantage and key moments. With a board included this could become a more helpful tool. But a board takes up a lot of screen real estate and I believe the traditional design may be more effective. I think sequentially seeing the board and the evolution of the game is a more effective learning tool than seeing an overall evolution as this visualization shows. But, I'm not a published IEEE VIS author (yet) so these researchers may be on to something after all. + +Sources +--- +1. Chess Evolution Visualization - https://ieeexplore.ieee.org/abstract/document/6710145 \ No newline at end of file diff --git a/week11.md b/week11.md new file mode 100644 index 0000000..37a6db1 --- /dev/null +++ b/week11.md @@ -0,0 +1,28 @@ +Week 11 - Learning Representations by Humans, for Humans +=== +By Andrew Nolan (4-19-21) + +![The abstract of the paper](./images/week11/abstract.PNG) + +On Tuesday April 13th, Sophie Hilgard, a computer science PhD researcher at Harvard, gave a talk to my CS 525 - Predicting Human Decisions class. Although this paper is primarily an AI/psychology paper, it utilizes several visualization techniques as the key novelty of the research. I thought it would be useful to reflect on and share with the class here. + +Machine learning is really cool. It has a lot of predictive power, its fast, it can handle lots of data, and it can reveal hidden patterns. That said, when humans are involved, it loses some power. Humans have many biases including overreliance on tools, under reliance, negative feedback loops, expert biases, the list goes on. Often times we ignore the "highly accurate" recommendations of AI. Even in a non-serious situation, say Netflix film recommendations, we will happily overrule the recommendations of a computer. Even in more serious situations, such as in the medical field, doctors will often ignore the feedback of AI if they do not understand its reasoning. + +One type of machine learning is *Representation Learning* also known as Feature Learning [2]. Representation Learning seeks to aid researchers in feature engineering, by removing or combining features of multi-featured data. Examples include common methods such as PCA or clustering. These are useful things in most data focused fields, including Vis. Decisions are subjective and also narrative. The researchers of this paper propose that instead of providing users with a simple scalar recommendation or a list of features that the user will simply ignore, we can train an algorithm to produce a visualization that will guide users to a more accurate decision. Instead of training a neural network to have a computer make a decision, we train a network in which in the end a human makes a decision. They call this approach *mind composed with machine* (MoM). The process can be seen in the figure below. + +![The prediction process](./images/week11/process.PNG) + +The first experiment conducted for this paper was to evolve a decision compatible scatterplot by taking data classified as either *X* or *O*. We give the user a scatterplot and over several iterations improve the scatterplot to achieve high human accuracy. The evolution of these scatterplots can be seen below. After about 4 iterations humans had a nearly perfect classification rate. Of course, the computer was able to accurately classify from the beginning, but our goal here is to have humans make decisions. By adjusting the point clouds after each response to account for bias we can generate scatterplots that get humans to make the decisions we would want. You can see the evolution of the scatterplots and how the 2d points get shifted in 3d space in the figure below. + +![The evolution of the point clouds](./images/week11/points.PNG) + +During her talk, Sophie said this approach can be used with many types of outputs: scatterplots, highlighting key features, pros and cons lists, highlighting key reference points, or "algorithmic avatars". They wanted to make their paper extra novel, so they experimented with this last approach. Algorithmic avatars are faces used to provide emotional representations that numeric values may not give to humans. The experiment in this paper found that humans making loan approval predictions were more accurate taking advice from an avatar than a numeric value. For example a look of distress or concern made users less likely to approve a loan that a simple 30% chance of paying it back. The results, like the scatterplots before, showed that MoM can be used to generate visualizations that aid humans in making decisions. Below is an example of some of the avatars that were shown to users during this experiment. + +![The facial advice](./images/week11/faces.PNG) + +I find explainable AI to be a really cool subject I would like to learn more about. While this isn't an example of explainable AI, it is an example of how humans can be put into the machine learning pipeline. It also shows a cool application of visualizations to not just depict data but use data to encourage users to make specific decisions. + +Sources +--- +1. Learning Representations by Humans, for Humans - https://arxiv.org/abs/1905.12686 +2. Representation Learning - https://en.wikipedia.org/wiki/Feature_learning \ No newline at end of file diff --git a/week12.md b/week12.md new file mode 100644 index 0000000..4a5984d --- /dev/null +++ b/week12.md @@ -0,0 +1,36 @@ +Week 12 - Let's Gamble: How a Poor Visualization Can Elicit Risky Behavior +=== +By Andrew Nolan (4-26-21) + +![The abstract of the paper](./images/week12/abstract.PNG) + +As this semester progresses, I am becoming increasingly aware that I have a strong interest in learning more about using computers to predict human decisions. Luckily, I am taking a Predicting Human Decision class. Last week, on Tuesday April 20th, our class had a guest speaker talk about using visualizations to affect human decisions. This speaker was Melanie Bancilhon of Washington University in St. Louis. She talked about several papers, but the one that interested me the most was "Let's Gamble: How a Poor visualization can Elicit Risky Behavior". + +This paper, similar to what we did in A3, works to compare the effectiveness of different visualization idioms. In this experiment they are trying to determine if there is an optimal encoding for uncertainty data or if a poor encoding can cause risky decisions. They tested six methods, five encodings: icon arrays, pie charts, circle charts (a donut chart where the hole in the donut compared to the rest of the disk is the proportion compared), a triangular area chart, and a bar chart (specifically a single stacked bar), also no visualization (just text) was tested as a baseline. These visualizations can be seen in the figure below. + +![The different types of encodings tested](./images/week12/encodings.PNG) + +What it means for a human to make a decision is not agreed upon. The general process can be decomposed into: Input -> Process -> Output. But that really doesn't mean too much. Psychologists often say we make choices based on evidence/beliefs about specific events. Economists claim we make decisions based on a weighted sum of outcomes and their probabilities. Regardless, of which approach you follow, visualizations are becoming increasingly prevalent tools in decision making situations. + +Related work for this paper touches on a lot of similar papers to what my team read for our final project prospectus on Weather Uncertainty. For example, the paper *When(ish) is my Bus?*. The (possible) issue with all of these papers is that there are many different ways to visualize uncertainty/risk data and in situations in which the outcome can significantly affect the decider, visualizations can play a key role in high stakes choices. This paper tries to understand the impact visualizations can have on decision making. + +The experiment conducted for this paper replicates a classic experiment from the domain of economic decision making. Study participants are presented a lottery with 20 choices each with two sets of options, a guaranteed prize or a gamble. The choices range in probability, the best being a gamble for a X point prize and a guaranteed X point prize. The riskiest outcome being a gamble for a X point prize or a guaranteed X/20 point. The lottery randomly picks one of the rows and then your choice within that row determines your prize. When making their choices, the participants could see the probability of winning the lottery choice via one of the five visualizations described above. An example of the test scenario can be seen in the figure below. + +![An example decision scenario with a circle chart](./images/week12/scenario.PNG) + +300 Amazon Mechanical Turkers participated in the study. The quality of the lottery decisions was measured using a metric called Relative Risk Premia (RRP). An RRP of 0 implies "risk neutrality", a high RRP implies risk aversion, and consequently a negative RRP implies risk seeking behavior. There were two hypotheses with this experiment: +1. Decisions would follow "Prospect Theory", i.e. participants would be risk seeking for small probabilities and risk averse for large probabilities. (Prospect Theory says humans overvalue relatively rare events and undervalue common ones). +2. Visualization design will affect the decisions made. + +No matter the conditions, participants were found to be risk-seeking with low probabilities and risk averse with high probabilities, supporting hypothesis 1. The results of the experiment found icon to be have the least deviation from risk neutral, closely followed by bar and pie. Triangle was the least effective. Bar was shown to be closest to no visualization. Possibly implying the bar was as interpretable as text, supporting previous findings. The log RRP results vs probability of the gamble can be seen here. + +![Log RRP Results](./images/week12/logrrp.PNG) + +Triangle and circle visualization groups varying far from the base line none groups implies that data presentation can influence risk behavior. These are the only two groups that differed significantly from the base line. The paper states that this should be taken into account for researchers presenting data. + +The study mentions it's gambling game as a limitation and encourages future research using RRP to occur with other scenarios. I found this paper interesting. It seems to be consistent with famous Cleveland and McGill experiment (position being the most effective encoding channel). Although seeing it applied to uncertainty is cool. I think it's important to think about how the tools we are designing can be used in real life. + + +Sources +--- +1. Let's Gamble: How a Poor Visualization Can Elicit Risky Behavior - https://arxiv.org/abs/2010.14069 \ No newline at end of file diff --git a/week13.md b/week13.md new file mode 100644 index 0000000..7534916 --- /dev/null +++ b/week13.md @@ -0,0 +1,26 @@ +Week 13 - Truth or Square: Aspect Ratio Biases Recall of Position Encodings +=== +By Andrew Nolan (5-3-21) + +![The abstract of the paper](./images/week13/abstract.PNG) + +Another experimental paper! We know that bar charts, using the position encoding, are very accurate visualizations. Previous work has, paradoxically, found that bar charts can be biased with both under and over estimations. This paper did three experiments to test how aspect ratios and the limits of human memory can affect these biases. + +The first experiment tested if ***aspect ratio*** affects perception of position. Participants each performed 216 trials (72 of each type: wide, square, and tall). In each trial they were presented with a single bar, and then shown a blank screen, and then asked to redraw the bar. The results showed overestimation came from bars with wide ratios, and underestimation error came from tall aspect ratios. These results are consistent with previous work on Biases in bar chart aspect ratio from Xiong et Al. and McColeman et Al. An interesting pattern to the bias showed that users tended to misrepresent objects treating them as if they are closer to a square shape. + +Experiment one varied aspect ratios but maintained the area of the rectangle. However, in the real world, typically a change in aspect ratios will also indirectly affect the ***area*** encoding of the graph. The second experiment repeated the procedure of the first test, but this time also varied the area of the bars. As with experiment one, this test showed underestimation and overestimation. Unfortunately, it was not possible to change the area of the bar without also changing the aspect ratio. The variance in these results aligned very closely with the first experiment. These results lead the researchers to the conclusion that area does not significantly affect the biases in bar charts, the aspect ratio appears to be the primary source of bias. + +The third and final experiment tested ***the role of memory in position estimates***. This experiment slightly modified experiment one to include trials in which the initial bar is still present when the user is asked to redraw the bar. The results showed that most of the error was reduced by removing the memory component of the task. Possibly implying recall is the limiting factor for graphical perception. + +Below is a figure showing the procedure for the three experiments: + +![The three experiments](./images/week13/experiments.PNG) + +The three experiments show that position encodings can be biased based on aspect ratio, specifically underestimation when the aspect ratio is tall, and overestimation when the ratio is wide. Square ratios tend to be the most accurate. Additionally, human memory affects these perception biases, specifically through recall. This work supports previous findings and emphasizes that incidental visual properties can affect the original encoding channel, such as aspect ratio skewing position measures. + + + + +Sources +--- +1. Truth or Square: Aspect Ratio Biases Recall of Position Encodings - https://ieeexplore.ieee.org/abstract/document/9222047 \ No newline at end of file diff --git a/week14.md b/week14.md new file mode 100644 index 0000000..ec398fa --- /dev/null +++ b/week14.md @@ -0,0 +1,47 @@ +Week 14 - Data Visualization for Strategic Decision Making +=== +By Andrew Nolan (5-10-21) + +![The abstract of the paper](./images/week14/abstract.PNG) + +This week I found a case study published by CHI back in 2002. I have realized I am very interested in the intersection of decision making and data visualization. However, there do not seem to be too many papers about the topic, at least not too many recent results when doing a Google Scholar search of "data visualization decision making". I have discussed some of the recent papers I've found in previous reflections. But the apparent lack of papers makes me think maybe if I go back to school for a PhD this might be an area I would want to focus my research efforts in. I guess we shall see... + +Anyway, this paper is a case study in designing a web-based data visualization tool to help a pharmaceutical R&D team plan long term strategic decisions. The paper defines strategic decisions as long term investments of resources (dollars, people, and time) affecting a business. Specifically, this tool is designed to help the R&D team decide if they should continue or abandon projects, the so called "Stop or go" problem. Ironically, this application was never fully used by the company, because while reevaluating their budget and IT spending this project was cut. + +The paper examines two proposed systems designed to address the decision issue. The President of the company stated that he felt there was no comprehensive ***picture*** of the company, and that he had no way to ***see*** the full operation. The paper highlights the words picture and see. They made a proof of concept vis for the finance department, pictured below. After it was successful they planned to design a larger tool that could 'roll up'/aggregate data for high level managers and 'roll down'/provide detail view for lower managers of specific projects. + +![The initial finance proof of concept](./images/week14/proofofconcept.PNG) + +Sorry these are low resolution screenshots, it's how they look in the paper. 2002 was a different time... + +The project spent a lot of time in user task analysis/preliminary design. An interesting observation. with this paper being a case study, is it shows how much effort needs to go into initial design when working with an actual client. When publishing your own academic paper, design is important, but you have final say and don't need to be held back from your ideas by someone else's whims. In the following images you can try to see what the initial data looked like, a tabular representation. The data primarily consisted of info about dollars, people, and time. Then in the next image you can see how the team restructured it into a visualization that would be more helpful for strategic decision making. + +![What the client's data initially looked like](./images/week14/before.PNG) + +![What the client's data looked like in the visualization the team designed](./images/week14/after.PNG) + +Again sorry for the low res pictures, its the best I could get from this paper. + +In the after figure above, they call the visualization a "Drill Down" view. This vis shows a high level overview, but clicking on certain parts gives you more detailed data. The dots, specifically the red dots, indicate warnings, such as being over budget or understaffed. + +The appendix of the paper has much larger easier to read figures of the tool. I'm going to share some of their other visualizations here. The first one is the Matrix Map. It shows a matrix of product lines vs project phases. It is intended to provide an overview of an entire company's portfolio to help understand where resources (such as people and money) are allocated. + +![The matrix map view](./images/week14/figure6.PNG) + +The next two figures were only designs, they did not make it into the original product. The first figure is an advanced version of the previous "matrix map". This time with an emphasis on changes over time. The second view is a "pipeline flow". It shows the change in project phases over time. + +![The advanced matrix map](./images/week14/figure7.PNG) + +![The pipeline flow](./images/week14/figure8.PNG) + +The paper concludes by explaining that supporting decision making is a process, not necessarily just an application. Companies have to make decisions about things to cut, and that's just business. They do believe that although the project ended abruptly, the initial success of the proof of concept tool in the finance department of this pharmaceutical company, and the enthusiasm of some executives implies that visualizations could be an important tool for business decisions. 20 years later, I'd say that's probably true. I think this paper was an interesting diversion from a lot of the more academic papers I have been reflecting on in recent weeks. It shows more of the industry angle and how visualizations can evolve to fit a customer's needs. + +I think another super interesting aspect of this report is seeing how papers have evolved over time. Maybe since this is a case study its not the perfect representation. But it was published in CHI in 2002. Now, if you look at a CHI paper from 2020, it looks a lot cleaner and has many more references. I pulled up two random papers from the CHI 2020 free proceedings and they had 64 and 86 works cited in them. This paper has 10. I'm not sure if that's to say academia or CHI has gotten more competitive or more rigorous, but it was an interesting observation. + +Anyway, I suppose that this is my final *formal* data vis reflection. I'm sure I will continue to read and reflect on papers on my own, but I'm not sure I'll be doing a weekly write up now that class is done. It's been fun though and I think I got to learn a lot through these reflections. + +Have a nice summer! + +Sources +--- +1. Data Visualization for Strategic Decision Making - https://dl.acm.org/doi/abs/10.1145/507752.507756 \ No newline at end of file diff --git a/week2.md b/week2.md index e69de29..39638db 100644 --- a/week2.md +++ b/week2.md @@ -0,0 +1,23 @@ +Week 2 - COVID Rollout by State in the US +=== +By Andrew Nolan (2-15-21) + +![Vaccine Rollout by State](./images/week2/percentvacinnatedbystate.PNG) + +#### Dicusssion +As we've mentioned in class, the New York Times has become very good at using data visualizations in their articles for readers to consume to learn and enjoy. They currently maintain a page that tracks vaccine rollouts by state, it can be viewed here: https://www.nytimes.com/interactive/2020/us/covid-19-vaccine-doses.html. The center piece of this page is a map of the United States in which each state is color coded according to the percentage of the population who has recieved at least one shot. The data visualization includes a mouseover feature that allows users to view the exact percentage of citizens who have recieved one or two shots in the state. + +I saw this visualization while I unsuccessfully sought optimistic information about the vaccine rollout. After last Thursday's in class exercise, when I see visualizations I am thinking a lot more about the what, why, and how. So when I stumbled upon this visualization, besides the COVID info, those questions were my early thoughts. The what of the vis includes things such as: a spatial geometry of the US, sequential ordinal data (increasing percentages), and although it's coming in very slowly, I suppose it's dynamic data since it's updated over time. The why includes a common action of the New York Times - Analyzing to consume and enjoy. It also allows for searching with lookup. The targets are feature distributions showing the population that is vaccinated. The how includes an encoding that arranges the data by using a map and maps it by using color luminance. Overall I think it's an effective visualization. COVID vaccinations per state is not a super complicated data, but this visualization displays it in a very effective and easy to use and analyze way. Most other websites presenting similar data are doing it in table form, not as effectively. + +Finally, as a bonus this page has even more visualizations! The two extra visualizations I want to share are the following: + +![Average vaccines administered daily](./images/week2/dailydoses.PNG) + +![Estimate of when we will be vaccinated](./images/week2/whenwillwebevaccinated.PNG) + +These two visualizations are not as fancy as the map. One is a bar chart with a trendline and the other is a line graph with a prediction line extending it. While, simple, they again effectively convey the content they want to present and when used together they provide interesting analysis. The trendline of doses administered by day is following an increasing pattern. Implying that the number of people that can be vaccinated each day will continue to increase. The second graph bases it's percentage of the population vaccinated on a constant vaccination rate. However, assuming the vaccination rate is increasing each day, this prediction curve shouldn't be linear, it should be exponential. This means we may reach 90% vaccinated well before November 10th. At least I can hope that. I came into this data visualization looking for optimistic information, so maybe I just biased myself into that interpretation... + +Sources +--- +1. https://www.nytimes.com/interactive/2020/us/covid-19-vaccine-doses.html +2. The Visualization Analysis and Design textbook diff --git a/week3.md b/week3.md index e69de29..f7f2d4a 100644 --- a/week3.md +++ b/week3.md @@ -0,0 +1,20 @@ +Week 3 - The Companies Disney Owns +=== +By Andrew Nolan (2-22-21) + +![The companies Disney owns](./images/week3/DisneyProducts.jpg) +Source: https://www.titlemax.com/discovery-center/money-finance/companies-disney-owns-worldwide/ + +It's common to hear stories on social media or in the news about how just a few super corporations control everything. This visualization provides an example of that. We all know Disney owns a lot of things, but this graphic allows you to really explore just how many things that is. The article discussing this map is also a few years old at this point, so we can expect that the Disney owns even more now. + +Discussing the visualization, to start there are a few cool features I want to point out. Mapping all of the Disney products into the iconic Mickey Mouse shape doesn't provide any help for exploring the data, but it does provide a recognizable shape to help emphasize that all of these companies fall under the Disney umbrella. The main purpose of this visualization is to show the reader how big Disney is, while it does this goal well, it arguably obscures some other info. The cluster sizes seem to be based on number of subcompanies. This is fine for showing that Disney owns many products, but it obscures the value or relative size of the companies. National Geographic appears to take up a similiar, if not slightly larger, area to Star Wars, however Star Wars is valued to be worth over 25 times as much money [2][3]. + +Some other problems with this vis, the color scheme for the categories is not great. Running it through color blind software, several of the categories are indistinguishable and even without color blind tools the different shades, particularly of green, are hard to differentiate without the context of the company types. This mapping is also massive, it's easy to get lost in it as you have to zoom in to be able to read anything. A nicer zoom of mouseover feature would have been a good addition. + +That's not to say this is a bad visualization. The goal is to show the reader how HUGE of a corporation Disney is. The graphic is kind of overwhelming and at times unwieldy, but maybe that's the point. + +Sources +--- +1. https://www.titlemax.com/discovery-center/money-finance/companies-disney-owns-worldwide/ +2. https://www.investopedia.com/articles/investing/102215/why-star-wars-franchise-so-valuable.asp +3. https://www.washingtonpost.com/lifestyle/style/peril-cited-in-national-geographic-sale-not-evident-in-financial-disclosures/2015/09/11/456049de-589d-11e5-b8c9-944725fcd3b9_story.html diff --git a/week4.md b/week4.md index e69de29..95eb9d6 100644 --- a/week4.md +++ b/week4.md @@ -0,0 +1,43 @@ +Week 4 - Into the Okavango +=== +By Andrew Nolan (3-1-21) + +![image](./images/week4/unsurveyed.PNG) + +[A map of Africa showing the areas where wildlife has been surveyed. The Cuito river area is laregly unexplored.] + +Last week in class I shared my reflection on a visualization of companies that Disney owns. The discussion turned to National Geographic, and Professor Harrison mentioned the movie *Into the Okavango* on Disney Plus. He said that it was a great documentary and also had cool visualizations in it. So I decided to check it out this week. + +The movie follows several biologists and their guides as they travel down the remote Cuito river, the primary tributary to the Okavango Delta, in Angola on a quest to see if they can determine why the Okavango Delta is running dry. The expedition was largely successful and resulted in geotagging over 30,000 animals, discovery 38 species unknown to Angola, and 24 species potentially never before seen by science. As a documentary it was engaging and well shot. Watching it through the lens of this class I paid attention to things I otherwise would have overlooked, or potentially not thought of as data vis, even though now I clearly know they are. + +For example, throughout the film they show little images like the one below. This line represents the path they followed down the river and the dots and captions say when a bird was spotted. A major goal of the expedition was to spot birds, because birds have the easiest ability to migrate, so if an area is deprived of birds that's a bad sign. This data vis doesn't say much but it helps the viewer understand the lack or plentiful amount of birds in the different parts of the journey. Seeing more in the lush parts and less in the dried out areas. + +![Bird sitings](./images/week4/birds.PNG) + +Around 18 and a half minutes into the movie, there is another example of bird data. This time with an animation and overlay showing the decrease in bird spottings in the Okavango Delta annually over five years. Although, the first year happens in a scene transition and is harder to read, it's still clear that the bird population is dwindling over 5 years. + +![Birds year 1 vs year 5](./images/week4/year1.PNG) + +![Birds year 1 vs year 5](./images/week4/year5.PNG) + +Beyond the movie, the Okavango project also features an interactive Google Earth page with several visualizations [3]. The first shows a map with an overlay depciting human impact on nature. It uses a saturated green scale to show the level of impact. Suprisingly/confusingly to me, the more vivid green meant lower human impact. Showing again, that the Cuito river is unexplored. + +![Human Impact](./images/week4/humanimpact.PNG) + +Another interesting Google Earth visualization they shared includes a graph of wildlife density. This ties back to those bird sitings we talked about earlier. Here it uses a color gradient scale from yellow to red, red being the most wildlife spottings. You can see that in the lush delta areas more animals are spotted. It also is intersting to see the points plotted along the river in this way. + +![Wildlife Density](./images/week4/wildlifedensity.PNG) + +The final visualization I wanted to share is the fire risk levels of the basin. This time it uses a saturated red scale to show levels, with the most red areas being the highest risk. Interestingly, although not perfect, it seems to show a somewhat inverse relationship with where the wildlife spottings occured. This makes sense as the animals likely go to lush areas and the fires occur in dry areas. + +![Fire Risk](./images/week4/firerisk.PNG) + +This documentary and journey has a lot of interesting visualizations throughout that are used to help tell their story. I think it was interesting to see how they overlayed visualizations on actual maps/landscapes as opposed to the sort of cookie cutter maps we usually see of the U.S. States. Plotting points along a river also allowed for interesting depictions of salience and understanding of the environment surrounding the river that a normal map can not show with a small blue line. Finally, I think this was appropriate to look at after the color lectures because a lot of these visualizations use color as their primary mark for the data. + +If you are reading this and haven't seen it yet, I'd encourage you to check out this movie. + +Sources +--- +1. *Into the Okavango* (2018) +2. https://www.nationalgeographic.org/projects/okavango/ +3. https://g.co/earth/okavango diff --git a/week5.md b/week5.md index e69de29..3598f6a 100644 --- a/week5.md +++ b/week5.md @@ -0,0 +1,16 @@ +Week 5 - Genetic Algorithms to Improve Parallel Axes Plots +=== +By Andrew Nolan (3-8-21) + +![Parallel Axes Comparison](./images/week5/parallel_axes.gif) + +For this next week of lectures we need to read chapter 7 from the Visualization Analysis and Design textbook. In this chapter they discuss lots of common ways to convert tabular data into basic visualizations. I found it very interesting to read about parallel axes graphs. Partially because I needed to build a tool that makes them as part of my MQP, but also because they are a fascinating way to display the relationships between *many* qualitative attributes, the number of features only truly limited by the width (or height) of the display. I think this makes parallel axes a very powerful exploratory tool for highly featured data. The problem is, as the number of features increase and the lines begin to overlap, the graph becomes very difficult to read and understand. According to [1], a common approach to improving the readability of parallel axes plots is to initially arrange the axes to minimize the number of crossing lines. While this is, in theory, a simple combinatorics problem, when there are possibly dozens or hundreds of features it can be difficult to decide what order to display these in. + +Published in the IEEE *International Conference on Systems, Man and Cybernetics*, Khiria Aldwib, Shahryar Rahnamayan, and Amin Ibrahim did research into this question by using a genetic algorithm [1]. They based their genetic algorithm fitness metric on the number of overlapping lines in the parallel axes graph. They state that this genetic algroithm fitness measure could be changed to any of the user's choice. The developed genetic algorithm used a "smart mutation". This mutation when applied swapped the two axes that produced the most crossing lines. In the figure shown at the top of this reflection, you can see the original parallel axes compared to a standard genetic algorithm as well as scheme4, the genetic algorithm with a smart mutation. As can be seen in the two examples above, their genetic algorithm with smart mutations substantially improved the fitness (to use a genetics term) of the plots. + +This research is really cool. As simple as it sounds, it's interesting to think about how just changing the order that features are displayed can greatly affect the usability of a visualization idiom. Especially with parallel axes, simply changing the order of features can make it more readable and make the correlations more clear. It can be difficult for a human constructing the vis to figure this out themselves, so it's nice that computers can do that for us. It's also really fun to see how vis interacts with other CS disciplines, in this case AI. + + +Sources +--- +1. Enhancing Parallel Coordinates Visualization Using Genetic Algorithm with Smart Mutation: https://ieeexplore.ieee.org/abstract/document/9282852/ \ No newline at end of file diff --git a/week6.md b/week6.md index e69de29..e6b9258 100644 --- a/week6.md +++ b/week6.md @@ -0,0 +1,25 @@ +Week 6 - TimeSearcher: Visual Exploration of Time-Series Data +=== +By Andrew Nolan (3-15-21) + +![TimeSearcher v2](./images/week6/timesearcher2.PNG) + +As C term comes to a close I find myself spending an increasing amount of time working on my MQP,trying to wrap it up. We are now trying to publish our work and I have been doing a lot of research to make sure our related work section is adequate. A key part of our project involves visualizing and querying time series data. I was reading the article *Comparing Similarity Perception in Time Series Visualizations* by Gogolou, Tsandilas, Palpanas, and Bezerianos [3]. In this paper they briefly discuss the "growing interest in interactive exploration and querying of time series", that's what I'm working on! TimeSearcher, a tool from the University of Maryland's HCI lab was particularly interesting to me. So since I have to write about it for my MQP paper, I thought it would be a good tool to write about for my reflection in this class too. :) + +TimeSearcher is a very popular system that provides time series analysts with a tool to perform a live similarity search of their time series data. The embeded videos on [1], show this in action. They use a tool they call a Timebox to highlight the section of the time series they want to compare [2]. Timeboxes are "a powerful graphical, direct-manipulation metaphor for the specification of queries over time-series datasets". Basically it's a drag and drop box to highlight the part of the time series you want to query. Since being released in 2005 it has gone through three iterations, the most recent version is TimeSearcher 3. + +It's pretty common practice to display time series as a line plot. It's also common practice to do similarity measures between time series, either by basic measures like euclidean distance or advanced algorithms like dynamic time warping. What's novel and cool about this tool is the interactive element. Dragging and dropping a box to highlight the sections of the time series you want to perform similarity search on is a helpful feature. It seems very simple, but it provides a powerful visual element to standard time series comparison. When running time series comparison algorithms, like DTW, usually you just specify the indices of the time series subsequence you want to compare. Now, with Timeboxes you can actually see the subsequence you are comparing. The indices remain the same, but seeing the shape of the subsequence and how it rises and falls is a huge bonus for analysts. + +My MQP didn't have time to implement a feature like Timeboxes this year, but it's a good idea for future work and should probably be a standard in all time series query tools. + + +#### Video - TimeSearcher Demo +[![TimeSearcher Demo](./images/week6/video.PNG)](https://www.youtube.com/watch?v=VWx1TMcrb74&ab_channel=CatherinePlaisant "TimeSearch Demo") + +Sources +--- +1. TimeSearcher in the University of Maryland HCI Lab - http://www.cs.umd.edu/hcil/timesearcher/ +2. Dynamic query tools for time series data sets: +Timebox widgets for interactive exploration - https://www.cs.umd.edu/hcil/trs/2004-26/2004-26.pdf +3. Comparing Similarity Perception in Time Series Visualizations: https://ieeexplore.ieee.org/document/8440826 +4. TimeSearcher Demo: https://youtu.be/VWx1TMcrb74 \ No newline at end of file diff --git a/week7.md b/week7.md deleted file mode 100644 index e69de29..0000000 diff --git a/week8.md b/week8.md new file mode 100644 index 0000000..ba9c586 --- /dev/null +++ b/week8.md @@ -0,0 +1,22 @@ +Week 8 - Time Series Bitmaps +=== +By Andrew Nolan (3-29-21) + +![Bitmaps in the file browser](./images/week8/bitmaps.PNG) + +As with many previous weeks, I am still working on my MQP paper about time series visualization/data mining. So once again, I am going to reflect on one of the papers I read for my related work section. This week I found a cool time series visualization tool for clustering and anomaly detection. Researchers Kumar et Al. from University of California - Riverside developed bitmap representations of time series that can be seen in a standard file explorer, like the one seen in the image above. Instead of displaying an icon in the browser and sorting by file size, name, type, etc... This tool allows users to display a bitmap representation of time series features and sort by similarity. The bitmap and features are displayed using "Chaos Game Theory". This approach is generally used for mapping DNA sequences, but it can also work for time series. First we assign segments of the time series to different values (as seen below)... + +![Assigning time series chaos game values](./images/week8/timeseries.PNG) + +Then with these values, we can create the bitmaps. The number of squares in the bitmap would be dependent on the length you are interested in. This string to bitmap conversion can be seen in the following figure. + +![Assigning time series chaos game values](./images/week8/grids.PNG) + +The researchers chose to make these a plugin for file explorers to make it an easy way to visualize time series data. They admit this is not ideal for visualizing one time series. The representation is abstract and works best when it is use as an exploratory tool to compare many time series for clusters or anomalies. The brief evaluation in the paper concludes that this is a high quality tool for cluster visualization. However, they wish to do a formal user study in future work. + +My personal thoughts on this research are that I think it's a very interesting approach to time series cluster visualization. I like the concept of visualizing data directly in the file browser so I don't have to open the files or another program to understand what I am analyzing. One drawback of this would be for people using command line tools, which are very common for temporal data analysis. Additionally, it assumes each time series is in a separate file, when commonly each row of a file (like a .csv or .tsv) will contain a separate time series. The bitmap approach is also very interesting to me. I think most time series similarity tools just list the ID of the similar/clustered time series or show them in a line graph. This abstraction is a clever way to show clusters compactly. + +Sources +--- +1. Time-series Bitmaps: a Practical Visualization Tool for Working with Large +Time Series Databases - https://epubs.siam.org/doi/abs/10.1137/1.9781611972757.55 or http://alumni.cs.ucr.edu/~ratana/KumarN.pdf \ No newline at end of file diff --git a/week9.md b/week9.md new file mode 100644 index 0000000..0954e0c --- /dev/null +++ b/week9.md @@ -0,0 +1,34 @@ +Week 9 - Creating Access to Music Through Visualization +=== +By Andrew Nolan (4/5/21) + +I found this article from the IEEE Toronto International Conference on Science and Technology for Humanity while doing background research for my team's final project proposal. The authors of this paper propose that music visualizations could help deaf and hearing impaired individuals enjoy and understand the sentiments conveyed by music. They conducted a study evaluating several major types of music visualizations: Music Animation Machines (MAM), iTunes Visualizer's magnetosphere, and motion pixels of music (MPM) visualizations. + +They used three different types of MAMs: MAM Piano Roll display, MAM Part Motion display, and MAM Tonal Compass display. Examples of these visualizations can be seen below: + +![MAM Piano Roll](./images/week9/mam_pianoroll.gif) + +MAM Piano Roll displays musical data by using bars, sometimes with different colors, to represent notes. It's a very commonly used method of music visualization. The line in the middle shows which notes are currently being played. + +![MAM Part Motion](./images/week9/mam_partmotion.gif) + +MAM Part Motion displays the same information as the piano roll. But instead of rectangles, it uses bubbles of varying sizes to show the length of the note. The dots fade away after the note is played. + +![MAM Tonal Compass](./images/week9/mam_tonalcompass.gif) + +The tonal compass MAM "arranges the pitches around a circle based on the circle-of fifths model". The size of circle shows the weight of the pitch. I am not familiar enough with music theory to tell you what all that means, but it is apparently useful for visualizing chords. + +![iTunes Visualizer magnetosphere](./images/week9/itunes_visualizer.gif) + +The iTunes visualizer is what the authors call a "Pretty Picture". By doing waveform analysis on the audio files it produces visualizations. The default vis, and what they tested on, is called a *magnetosphere*, because it appears similarly to magnetic fields around a sphere. This and the other visualizations in iTunes are very visually appealing, but do not represent any explicit information about the music data. The authors make an important point to mention these are designed to be visually stimulating, so even for "boring" songs they may mislead a viewer. + +![Motion Pixels of Music](./images/week9/mpm.gif) + +Motion of Pixels provides explicit music information including tonality, harmony, pitch, timing, instrumentation, and beat/rhythm info. It usually is used with MIDI files and can display info from all MIDI channels. The circles in the middle of the visualization are used to represent the tempo/beat information. The half circles on the outside show the different instruments and which notes they play at a given time. + +In my opinion these are all cool visualizations for music. But I did play saxophone in band in highschool and I have some interest/knowledge in music. These visualizations seem to be helpful for other musically minded people. In my opinion, and based on the results of this paper, these visualizations may not be interesting for deaf or hearing impaired persons. The results of the survey showed MAM is lacking, MPM was boring to those without musical training, and iTunes visualizer, while the most entertaining, did not provide important meaning to the music. While they may be nice supplements to music there does not appear to be an adequate music visualization tool that can be used by anyone yet. We can make any of these visualizations very fun by using complex songs, but for a normal song these visualization idioms may be lacking. To quote the paper "In terms of music, accuracy of visualization takes second stage to entertainment." + + +Sources +--- +1. Creating Access to Music Through Visualization - https://ieeexplore.ieee.org/abstract/document/5444364 \ No newline at end of file